Introduction to NLP Library: TextBlob

0 Comments

TextBlob is a Python library for processing textual data. It has a bunch of cool which can help in analyzing and understanding text data in python.

Textblob is the library any NLP enthusiast should start with. It provides simple API for diving into common natural language processing (NLP) tasks such as

  • Part-of-speech tagging
  • Noun phrase extraction                                    
  • Sentiment analysis
  • Classification
  • Translation, and more

Installation

pip install textblob

Import Textblob library

from textblob import TextBlob

Now, we will create an object of Textblob and then pass the data as an input which we want to play with for analyzing.

test_sent = TextBlob("Data science projects have several steps to follow and for continuity, these steps should flow together. Without a systematic work flow, it is very easy to get lost in one of these steps. In industry, when people think that they finished a project, they often struggle to bring the project to full operational support as they have failed to consider these life cycle steps. This is a common, but serious one resulting from people who do not know or appreciate the meaning of the term, production ready.")

print(type(test_sent)) #print type of the sentence

Print all sentences that are in textblob object

print(test_sent.sentences)

Now, lets print all words for each of the sentence

for sentence in test_sent.sentences:  #get all sentences
    print(sentence)
    for word in sentence.words: # printing words of each sentence
        print (word)    
    print()

Now, we will print tag for every word

for words, tag in test_sent.tags:
    print (words, tag)

Print Nouns

nouns = list()
for word, tag in test_sent.tags:
    if tag == 'NN':  
        #print(word)
        if (word not in nouns):
            nouns.append(word)
print(nouns)

Using nouns we can also identify the context of our text

# we are using random library to pick random words from noun list 
import random

print ("This text is about...")
for item in random.sample(nouns, 2):
    print (item)

Sentence Correction

sample = TextBlob('I lke the movie but it was a bit lngthy.')
# correct() method will correct every word which are wrong written 
print(sample.correct()) 
# we can also check spelling by specifying word by its index
print(sample.words[9].spellcheck()) 

Analyzing News Article

In the following code I have taken an article which is related to Covid-19, we will use the text of this article for analyzing

article = '''
The Trump administration has selected five companies, including Moderna, AstraZeneca Plc and Pfizer, as the most likely candidates to produce a vaccine for the novel coronavirus, the New York Times reported on Wednesday, citing senior officials.
The other two companies are Johnson & Johnson and Merck & Co, according to the paper.
The selected companies will get access to additional government funds, help in running clinical trials, and financial and logistical support, the paper reported.
There is no approved vaccine for COVID-19 caused by the new coronavirus.
The report did not mention potential vaccines from French drugmaker Sanofi, Novavax Inc and Inovio Pharmaceuticals Inc – among the more than 100 in development globally.
It was not immediately clear if Wednesday’s move had any impact on those programs.
The announcement of the decision will be made at the White House in the next few weeks, according to the report.
The White House did not immediately respond to a request for comment.
“We cannot comment on information that is market-moving,” a U.S. Department of Health and Human Services official said.
The companies on the list are the farthest along in developing a vaccine and have significant manufacturing capacity.
The United States is planning massive clinical trials involving 100,000 to 150,000 volunteers in total, with the goal of delivering an effective vaccine by the end of this year.
To make that deadline, the government aims to start mid-stage testing in July.
The first two vaccines to start that trial would likely be from Moderna and the AstraZeneca/Oxford University combination, the National Institutes of Health Director Dr. Francis Collins told Reuters in an interview last month. He also said he expected vaccine candidates from J&J and Merck to eventually join the trial effort.
None of the companies were immediately available for comment.
'''
news_text = TextBlob(article)

Gathering all the nouns which are present in news article and saving it in a list

nouns = list()
for word, tag in news_text.tags:
    if tag == 'NN':  
        #print(word)
        #if (word not in nouns):
        nouns.append(word.lemmatize())
print(nouns)

Lets understand the context of the sentence using our noun list, the output of this code will be different every time you run because we are using random library

# we are using random library to pick random words from noun list
print ("This text is about…")
 for item in random.sample(nouns, 5):
     print (item)

For understanding the context of the sentence, we can also use collections library and based on the occurrence of word i.e. count we can identify the context of the text

from collections import Counter

counts = Counter(nouns)
# print the count of each word that are in our noun list
print(counts)
sorted_nouns = sorted(counts, key=counts.get, reverse=True) #sorting
top_nouns = sorted_nouns[0:5] #picking top 5 words
for item in top_nouns:    
    print (item)

# output
vaccine
coronavirus
paper
government
report 

Based on the output we can easily understand the context behind this article as it relates to coronavirus, its vaccine and report

# Another way of printing
print(counts.most_common(5))

Translation

We can use translate() method for translating in our native language, I’ll convert our test_sent into urdu language

print(test_sent)
print(test_sent.translate(to='ur'))

Converting text from Urdu to English language

test_sent= test_sent.translate(to='ur')
print(test_sent)
test_sent = test_sent.translate(from_lang="ur", to='en')
print(test_sent)

# detect language

print(urdu_blob.detect_language())

Sentiment Analysis

text = '''
Movie plot was not bad but actors performance was worst.
But I love the song
'''

blob = TextBlob(text)
print(blob.tags)           #  [('threat', 'NN'), ('of', 'IN'), ...]

print(blob.noun_phrases)   # WordList(['titular threat', 'blob',
                           #            'ultimate movie monster',
                           #            'amoeba-like mass', ...])

for sentence in blob.sentences:
    print(sentence)
    pol= sentence.sentiment.polarity
    print((pol))

Textblob is an amazing tool that makes NLP much easier and faster to understand its concepts.

This comes to the end of this article. For exploring more functionalities about this library visit this link .

Full code can be download from my github;
https://github.com/uzairaj/TextBlob

Check out more blogs on my website and YouTube channel
http://uzairadamjee.com/blog
https://www.youtube.com/channel/UCCxSpt0KMn17sMn8bQxWZXA

Thank you for reading 🙂


Leave a Reply

Your email address will not be published. Required fields are marked *