2.5 Hugging Face Pipeline for Quick Prototyping
Contents
2.5 Hugging Face Pipeline for Quick Prototyping#
We are now ready to dive into writing code leveraging the Hugging Face libraries! In this lesson, we learn about the highest-level abstraction of the transformers
library: the Pipeline
class.
Pipeline#
Quoting from the pipeline documentation from the Hugging Face website:
Pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks.
Pipelines make it easy to use models from the Hub and perform inference on tasks about NLP, Computer Vision, Audio, and others. When constructing a pipeline object, the constructor takes a task
argument as input, whose value can be one of the following:
“audio-classification”
“automatic-speech-recognition”
“conversational”
“feature-extraction”
“fill-mask”
“image-classification”
“question-answering”
“table-question-answering”
“text2text-generation”
“text-classification” (alias “sentiment-analysis” available)
“text-generation”
“token-classification” (alias “ner” available)
“translation”
“translation_xx_to_yy”
“summarization”
“zero-shot-classification”
Each task
has then a corresponding task-specific pipeline (e.g. “audio-classification” has a AudioClassificationPipeline
, “question-answering” has a QuestionAnsweringPipeline
, etc), whose details are hidden by the general-purpose and more abstract Pipeline
class.
Let’s see some code examples of pipelines.
Sentiment Analysis Pipeline#
First, let’s install the transformers
library.
pip install transformers
We then import the pipeline
function from transformers
.
from transformers import pipeline
We are now ready to experiment with pipelines! Let’s create one for sentiment analysis.
model = pipeline(task="sentiment-analysis")
The above cell will download a model for sentiment analysis and then print the following output.
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
As you can see from the printed output, since we didn’t specify a model, the library downloads the current default model for the sentiment-analysis
task, which is the distilbert-base-uncased-finetuned-sst-2-english
model. The output also says that it’s not recommended to not specify a model to use, because the default models for the tasks may vary over time.
So, we can initialize a pipeline specifying a model by using the model
argument.
model = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
Since it’s specified in the model card of distilbert-base-uncased-finetuned-sst-2-english
that it can be used for text classification only, then it’s not necessary to specify a task with the task
argument.
Let’s now use the pipeline to compute the sentiment of the “The weather is great today” sentence.
answer = model("The weather is great today")
print(answer)
[{'label': 'POSITIVE', 'score': 0.9998749494552612}]
The model classified the input sentence with the “POSITIVE” class, with a score of ~0.99. We can also pass several input sentences to the model as a list.
answers = model([
"The weather is great today",
"I hate running"
])
print(answers)
[{'label': 'POSITIVE', 'score': 0.9998749494552612}, {'label': 'NEGATIVE', 'score': 0.9976701140403748}]
Let’s try a different pipeline now.
Question Answering Pipeline#
First, we download the standard model for the question-answering
task and initialize a pipeline object.
model = pipeline(task="question-answering")
We see a sample usage of the question-answering
pipeline on the Hugging Face documentation, which shows that we can pass the question
and context
arguments to the pipeline. The model will then look in the context
for an answer to the question
, and extract it.
answer = model(question="What is the capital of Italy?", context="The capital of Italy is Rome.")
print(answer)
{'score': 0.9870143532752991, 'start': 24, 'end': 28, 'answer': 'Rome'}
The returned answer is a dictionary, where the start
and end
items represent respectively the indices where the answer begins and ends in the context
. At these indices there’s the answer to the question
, which is “Rome” in this example.
Translation Pipeline#
Let’s make an example with a translation pipeline from English to French. First, we create a pipeline for the translation_en_to_fr
task.
model = pipeline(task="translation_en_to_fr")
To use a translation pipeline we just need to pass the text to be translated to the pipeline object.
answer = model("My name is Fabio")
print(answer)
[{'translation_text': 'Mon nom est Fabio'}]
Text Generation Pipeline#
Next, let’s try using the famous GPT2 model for text generation.
model = pipeline(model="gpt2")
To generate text, we just need to pass the incipit text to the model.
incipit = "Unicorns are rare animals that live hidden in the mountains. "
answer = model(incipit)
print(answer)
[{'generated_text': 'Unicorns are rare animals that live hidden in the mountains. Most of them are never seen in the land like they are in the woods. When you look at these things and see some similarities, we can then deduce that'}]
Conversation Pipeline#
As the last example, let’s try something different with a conversation pipeline, i.e. models that can make conversations like chatbots. First, we download the DialoGPT model.
model = pipeline("conversational")
Then, we need to import the Conversation
class from transformers
. A Conversation
object is just a container for messages from the user and the model.
Let’s create a Conversation
object with a first message from the user and then pass it to the model to obtain a response.
from transformers import Conversation
conversation = Conversation("I want to visit a nice city in Europe - any suggestions?")
conversation = model(conversation)
print(conversation.generated_responses[-1])
I'd say Prague is a good place to visit.
In this example, the model suggests visiting “Prague”. Let’s see what the model answers if we ask “Where is it? In what country?”. To answer correctly, the model needs to look at the previous messages in the conversation to understand that the pronoun “it” is associated to the noun “Prague”.
conversation.add_user_input("Where is it? In what country?")
conversation = model(conversation)
print(conversation.generated_responses[-1])
It's in the Czech Republic.
The model is able to correctly answer with “Czech Republic” leveraging the whole conversation. In the conversation
object we can find the whole conversation.
print(conversation)
Conversation id: 1e0d1539-ee4c-493c-83e5-02257a267505
user >> I want to visit a nice city in Europe - any suggestions?
bot >> I'd say Prague is a good place to visit.
user >> Where is it? In what country?
bot >> It's in the Czech Republic.
Code Exercises#
Quiz#
Choose the best option. In the transformers
library, the Pipeline
class is a _____ class.
High-level.
Low-level.
Answer
The correct answer is 1.
True or False. In the transformers
library, the Pipeline
class executes the same code for each task.
Answer
The correct answer is False. Each task has then a corresponding task-specific pipeline (e.g. “audio-classification” has a AudioClassificationPipeline, “question-answering” has a QuestionAnsweringPipeline, etc).
Questions and Feedbacks#
Have questions about this lesson? Would you like to exchange ideas? Or would you like to point out something that needs to be corrected? Join the NLPlanet Discord server and interact with the community! There’s a specific channel for this course called practical-nlp-nlplanet.