2.8 Project: Detecting Emotions from Text#

Let’s do another sample project connected to text classification. In this lesson, we’ll test an emotion classification model from the Hub.

Emotion Detection#

Emotion Detection is a technique that allows classifying texts with human emotions like “joy”, “surprise”, “anger”, “sadness”, “fear”, and “love”. It can be used to monitor how users react on social media to specific news or to posts of your company.

../_images/emotion_1.png
../_images/emotion_2.png
../_images/emotion_3.png
../_images/emotion_4.png

Emotion Detection on Tweets#

We use the same tweets dataset from the previous lessons, looking for tweets with specific keywords and detecting their sentiment.

Look for Suitable Pre-trained Models#

On the Hub we can find the j-hartmann/emotion-english-distilroberta-base model, which is a checkpoint of DistilRoBERTa-base fine-tuned on six diverse datasets to classify emotions from text. As reported on the model card, the accuracy of the model on the evaluation set is of 66%, way better than the random-chance baseline of 1/7 = 14%.

The j-hartmann/emotion-english-distilroberta-base classifies texts into seven classes:

  • anger 🤬

  • disgust 🤢

  • fear 😨

  • joy 😀

  • neutral 😐

  • sadness 😭

  • surprise 😲

Install and Import Libraries#

Let’s install and import the necessary libraries.

pip install transformers datasets
from datasets import load_dataset
from transformers import pipeline
import pandas as pd
import plotly.express as px

Get Tweets about Microsoft#

In a real-world scenario, we would use the Twitter API to download the latest tweets posted containing specific keywords. In this sample project, we download the test split of the Twitter Sentiment Analysis Training Corpus.

# download the tweets dataset
dataset = load_dataset("carblacac/twitter-sentiment-analysis", split="train")

# convert dataset to pandas dataframe
df = pd.DataFrame(dataset).drop("feeling", axis=1)

Let’s print some tweets from the dataset.

# show sample tweets
for tweet in df["text"][:5]:
  print(f"- {tweet}")
- @fa6ami86 so happy that salman won.  btw the 14sec clip is truely a teaser
- @phantompoptart .......oops.... I guess I'm kinda out of it.... Blonde moment -blushes- epic fail
- @bradleyjp decidedly undecided. Depends on the situation. When I'm out with the people I'll be in Chicago with? Maybe.
- @Mountgrace lol i know! its so frustrating isnt it?!
- @kathystover Didn't go much of any where - Life took over for a while

Then, we keep only the tweets containing the keyword “microsoft” and print some examples of them.

# keep only the tweets with a specific filter word. Let's try with "microsoft"
filter_word = "microsoft"
df_subset = df[df["text"].str.lower().str.contains(filter_word)].reset_index(drop=True)
# show sample tweets
for tweet in df_subset["text"][:5]:
  print(f"- {tweet}")
- @_everaldo Entourage *has* a really ugly interface but the app is quite useful. Microsoft did great work here. (Not *a* great work)
- Searching for Susan Boyle on Microsoft's Bing during the Twilight of a New Moon listening to Eminem on the radio hey Google Wave
- Microsoft Windows Guru program came to an end for a lot of east coast guru's
- So Microsoft released Bing, huh? Too bad they didn't filter this out  http://tr.im/n32Z
- Very sad  Microsoft money to be discontinue http://dhf0i.tk

Using the Emotion Detection Model#

Next, we load the j-hartmann/emotion-english-distilroberta-base emotion detection model into a text-classification pipeline and use it to classify each tweet with an emotion.

# download pre-trained emotion classification model
model = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base")

# compute the emotion of each tweet using the model
all_texts = df_subset["text"].values.tolist()
all_emotions = model(all_texts)
df_subset["emotion_label"] = [d["label"] for d in all_emotions]
df_subset["emotion_score"] = [d["score"] for d in all_emotions]

Let’s see how many tweets fall into each emotion category.

# plot emotions found in tweets
plot_title = f"Emotions found in tweets about '{filter_word}'"
fig = px.histogram(df_subset, x="emotion_label", template="plotly_dark",
                   title=plot_title, color="emotion_label")
fig.update_layout(showlegend=False)
fig.show()