2.5 Hugging Face Pipeline for Quick Prototyping#

We are now ready to dive into writing code leveraging the Hugging Face libraries! In this lesson, we learn about the highest-level abstraction of the transformers library: the Pipeline class.

Pipeline#

Quoting from the pipeline documentation from the Hugging Face website:

Pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks.

Pipelines make it easy to use models from the Hub and perform inference on tasks about NLP, Computer Vision, Audio, and others. When constructing a pipeline object, the constructor takes a task argument as input, whose value can be one of the following:

  • “audio-classification”

  • “automatic-speech-recognition”

  • “conversational”

  • “feature-extraction”

  • “fill-mask”

  • “image-classification”

  • “question-answering”

  • “table-question-answering”

  • “text2text-generation”

  • “text-classification” (alias “sentiment-analysis” available)

  • “text-generation”

  • “token-classification” (alias “ner” available)

  • “translation”

  • “translation_xx_to_yy”

  • “summarization”

  • “zero-shot-classification”

Each task has then a corresponding task-specific pipeline (e.g. “audio-classification” has a AudioClassificationPipeline, “question-answering” has a QuestionAnsweringPipeline, etc), whose details are hidden by the general-purpose and more abstract Pipeline class.

Let’s see some code examples of pipelines.

Sentiment Analysis Pipeline#

First, let’s install the transformers library.

pip install transformers

We then import the pipeline function from transformers.

from transformers import pipeline

We are now ready to experiment with pipelines! Let’s create one for sentiment analysis.

model = pipeline(task="sentiment-analysis")

The above cell will download a model for sentiment analysis and then print the following output.

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.

As you can see from the printed output, since we didn’t specify a model, the library downloads the current default model for the sentiment-analysis task, which is the distilbert-base-uncased-finetuned-sst-2-english model. The output also says that it’s not recommended to not specify a model to use, because the default models for the tasks may vary over time.

So, we can initialize a pipeline specifying a model by using the model argument.

model = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")

Since it’s specified in the model card of distilbert-base-uncased-finetuned-sst-2-english that it can be used for text classification only, then it’s not necessary to specify a task with the task argument.

Let’s now use the pipeline to compute the sentiment of the “The weather is great today” sentence.

answer = model("The weather is great today")
print(answer)
[{'label': 'POSITIVE', 'score': 0.9998749494552612}]

The model classified the input sentence with the “POSITIVE” class, with a score of ~0.99. We can also pass several input sentences to the model as a list.

answers = model([
    "The weather is great today",
    "I hate running"
])
print(answers)
[{'label': 'POSITIVE', 'score': 0.9998749494552612}, {'label': 'NEGATIVE', 'score': 0.9976701140403748}]

Let’s try a different pipeline now.

Question Answering Pipeline#

First, we download the standard model for the question-answering task and initialize a pipeline object.

model = pipeline(task="question-answering")

We see a sample usage of the question-answering pipeline on the Hugging Face documentation, which shows that we can pass the question and context arguments to the pipeline. The model will then look in the context for an answer to the question, and extract it.

answer = model(question="What is the capital of Italy?", context="The capital of Italy is Rome.")
print(answer)
{'score': 0.9870143532752991, 'start': 24, 'end': 28, 'answer': 'Rome'}

The returned answer is a dictionary, where the start and end items represent respectively the indices where the answer begins and ends in the context. At these indices there’s the answer to the question, which is “Rome” in this example.

Translation Pipeline#

Let’s make an example with a translation pipeline from English to French. First, we create a pipeline for the translation_en_to_fr task.

model = pipeline(task="translation_en_to_fr")

To use a translation pipeline we just need to pass the text to be translated to the pipeline object.

answer = model("My name is Fabio")
print(answer)
[{'translation_text': 'Mon nom est Fabio'}]

Text Generation Pipeline#

Next, let’s try using the famous GPT2 model for text generation.

model = pipeline(model="gpt2")

To generate text, we just need to pass the incipit text to the model.

incipit = "Unicorns are rare animals that live hidden in the mountains. "
answer = model(incipit)
print(answer)
[{'generated_text': 'Unicorns are rare animals that live hidden in the mountains. Most of them are never seen in the land like they are in the woods. When you look at these things and see some similarities, we can then deduce that'}]

Conversation Pipeline#

As the last example, let’s try something different with a conversation pipeline, i.e. models that can make conversations like chatbots. First, we download the DialoGPT model.

model = pipeline("conversational")

Then, we need to import the Conversation class from transformers. A Conversation object is just a container for messages from the user and the model.

Let’s create a Conversation object with a first message from the user and then pass it to the model to obtain a response.

from transformers import Conversation

conversation = Conversation("I want to visit a nice city in Europe - any suggestions?")
conversation = model(conversation)
print(conversation.generated_responses[-1])
I'd say Prague is a good place to visit.

In this example, the model suggests visiting “Prague”. Let’s see what the model answers if we ask “Where is it? In what country?”. To answer correctly, the model needs to look at the previous messages in the conversation to understand that the pronoun “it” is associated to the noun “Prague”.

conversation.add_user_input("Where is it? In what country?")
conversation = model(conversation)
print(conversation.generated_responses[-1])
It's in the Czech Republic.

The model is able to correctly answer with “Czech Republic” leveraging the whole conversation. In the conversation object we can find the whole conversation.

print(conversation)
Conversation id: 1e0d1539-ee4c-493c-83e5-02257a267505
user >> I want to visit a nice city in Europe - any suggestions?
bot >> I'd say Prague is a good place to visit.
user >> Where is it? In what country?
bot >> It's in the Czech Republic.

Code Exercises#

Quiz#

Choose the best option. In the transformers library, the Pipeline class is a _____ class.

  1. High-level.

  2. Low-level.

True or False. In the transformers library, the Pipeline class executes the same code for each task.

Questions and Feedbacks#

Have questions about this lesson? Would you like to exchange ideas? Or would you like to point out something that needs to be corrected? Join the NLPlanet Discord server and interact with the community! There’s a specific channel for this course called practical-nlp-nlplanet.