5 NLP tasks using Hugging Face pipeline

#machinelearning #nlp #tutorial #python

Hugging Face is a company that has given many Transformer based Natural Language Processing (NLP) language model implementation. It has changed the way of NLP research in the recent times by providing easy to understand and execute language model architecture. There Github repository named Transformers has the implementation of all these models. This repository has 30k+ stars on Github. You can learn more about them from their official website, which also has great documentation now about the different models, languages, and datasets it provides.

Hugging Face pipeline is an easy method to perform different NLP tasks and is quite easy to use. It can be used to solve different NLP tasks some of them are:-

Sentiment Analysis
Question Answering
Named Entity Recognition
Text Generation
Mask Language Modeling(Mask filling)
Summarization
Machine Translation

Here I have tried to show how to use the Hugging Face pipeline and solve the 5 most popular tasks associated with NLP.

I have executed the codes on a Kaggle notebook the link to which is here. You can also execute the code on Google Colaboratory.

First of all, we will import the pipeline from the transformers library.

from transformers import pipeline

1) Sentiment Analysis

Sentiment analysis here means classifying the given text in POSITIVE or NEGATIVE labels based on their sentiment with a given probability score.

Here we will be giving two sentences and extracting their labels with a score based on probability rounded to 4 digits.

nlp = pipeline("sentiment-analysis")

#First Sentence
result = nlp("I love trekking and yoga.")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

#Second sentence
result = nlp("Racial discrimination should be outright boycotted.")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

The output for first sentence is

The output for the second sentence is

2) Question Answering

Question Answering means giving an answer to a question based on the information given to the model in the form of a paragraph, the information provided is known as context. The answer is a small portion from the same context.

Here a paragraph about Prime Numbers is given as context and then 2 questions are asked based on the context. The model gives the answers from the context. This context paragraph is taken from the SQuAD database.

nlp = pipeline("question-answering")

context = r"""
The property of being prime (or not) is called primality.
A simple but slow method of verifying the primality of a given number n is known as trial division.
It consists of testing whether n is a multiple of any integer between 2 and itself.
Algorithms much more efficient than trial division have been devised to test the primality of large numbers.
These include the Miller–Rabin primality test, which is fast but has a small probability of error, and the AKS primality test, which always produces the correct answer in polynomial time but is too slow to be practical.
Particularly fast methods are available for numbers of special forms, such as Mersenne numbers.
As of January 2016, the largest known prime number has 22,338,618 decimal digits.
"""

#Question 1
result = nlp(question="What is a simple method to verify primality?", context=context)

print(f"Answer: '{result['answer']}'")

#Question 2
result = nlp(question="As of January 2016 how many digits does the largest known prime consist of?", context=context)

print(f"Answer: '{result['answer']}'")

The answer to the first question is

and the answer to second question is

3) Text Generation

Text generation is one of the most popular tasks of NLP. GPT-3 is a type of text generation model which generates text based on a certain prompt given to it.

In this portion we will try to generate some text based on the prompt A person must always work hard and and then the model produces a short para, the output is dramatically correct but not very coherent because the model has fewer parameters.

text_generator = pipeline("text-generation")

text= text_generator("A person must always work hard and", max_length=50, do_sample=False)[0]

print(text['generated_text'])

The output for the above code is

4) Summarization

Text summarization means to comprehend a large chunk of textual data and then write a brief summary of that data.

Here we are trying to get the summary of a large paragraph about the Apollo Mission.

summarizer = pipeline("summarization")

ARTICLE = """The Apollo program, also known as Project Apollo, was the third United States human spaceflight program carried out by the National Aeronautics and Space Administration (NASA), which accomplished landing the first humans on the Moon from 1969 to 1972.
First conceived during Dwight D. Eisenhower's administration as a three-man spacecraft to follow the one-man Project Mercury which put the first Americans in space,
Apollo was later dedicated to President John F. Kennedy's national goal of "landing a man on the Moon and returning him safely to the Earth" by the end of the 1960s, which he proposed in a May 25, 1961, address to Congress. 
Project Mercury was followed by the two-man Project Gemini (1962–66). 
The first manned flight of Apollo was in 1968.
Apollo ran from 1961 to 1972, and was supported by the two-man Gemini program which ran concurrently with it from 1962 to 1966. 
Gemini missions developed some of the space travel techniques that were necessary for the success of the Apollo missions.
Apollo used Saturn family rockets as launch vehicles. 
Apollo/Saturn vehicles were also used for an Apollo Applications Program, which consisted of Skylab, a space station that supported three manned missions in 1973–74, and the Apollo–Soyuz Test Project, a joint Earth orbit mission with the Soviet Union in 1975.
 """

summary=summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]

print(summary['summary_text'])

The summary generated for the above paragraph is

5) Translation

All of must have used Google translate, this is the same task of translating one language to another.

In this portion of code we will translate a proverbial sentence from English to German.

translator = pipeline("translation_en_to_de")

print(translator("A great obstacle to happiness is to expect too much happiness.", max_length=40)[0]['translation_text'])