Chapter Quiz
Contents
Chapter Quiz#
Here you can find all the quizzes from the lessons in this chapter. Try to solve them to understand how well you have learned the concepts.
2.1 First Steps with Hugging Face#
What’s the Hugging Face library that provides APIs and easy-to-use functions to utilize transformers or download pre-trained models?
transformers
.tokenizers
.accelerate
.
Answer
The correct answer is 1.
Complete the sentence with the correct option. Hugging Face provides a _____ interface for working with NLP models than PyTorch and TensorFlow.
lower-level.
higher-level.
Answer
The correct answer is 2.
True or False. All the models available on the Hugging Face Hub have been trained by Hugging Face.
Answer
The correct answer is False. Everyone can create an account and upload a model on the Hugging Face Hub.
True or False. On the Hugging Face Hub you can find models for NLP tasks only.
Answer
The correct answer is False. There are models for Computer Vision and Reinforcement Learning as well. However, today the majority of models is about NLP.
True or False. Hugging Face Spaces lets you deploy and host your ML demos on Hugging Face, but you need a premium account.
Answer
The correct answer is False. Most of the demos on Hugging Face run on a free tier, using limited CPUs that are ok for most of the demos (they are demos, after all). It’s possible to pay to leverage better hardware (e.g. GPUs).
2.2 Hugging Face Hub: Models and Model Cards#
Select the option that describes something not typically found on a model card.
The evaluation results.
The model intended uses and potential limitations, including biases and ethical considerations.
Training datasets.
The model price.
The model architecture (e.g. BERT, RoBERTa, etc).
The training configuration and experimental info.
Answer
The correct answer is 4.
True or False. All the models in the Hugging Face Hub can be used commercially.
Answer
The correct answer is False. Whether a model can be used commercially or not is specified in the model license.
True or False. If you have a question about a model on the Hugging Face Hub, you should send a message to its author.
Answer
The correct answer is False. While you may be able to send a message to the author, the best way to proceed is to ask the question in the Community section.
True or False. On the Hugging Face Hub you can often test models directly from the browser.
Answer
The correct answer is True, thanks to the Hosted Inference API. Keep in mind that you have a limited amount of calls that you can make daily for free. Moreover, some models can’t be tested directly from the browser if they require complex input (e.g. Reinforcement Learning models).
2.3 Hugging Face Hub: Datasets#
Choose the incorrect option. On the Hugging Face Hub, datasets can be filtered by:
License.
Languages of the samples contained.
Number of models that have been trained on the dataset.
Multilinguality, i.e. whether the dataset can be used for multilingual model or not.
Size of the dataset (i.e. the number of samples contained).
Tasks for which the dataset has been created.
Answer
The correct answer is 3.
True or False. Is it possible to see some samples directly from the dataset page?
Answer
The correct answer is True. There is a “Dataset Preview” section on the dataset page.
What is a dataset card and what does it typically contain?
A document that lists useful information about a dataset, like its creation process or how to responsibly use the data.
A document that lists useful information about a dataset, like the best models trained with it and their scores.
A document that lists useful information about a dataset, like its authors and other similar datasets.
Answer
The correct answer is 1.
2.4 Hugging Face Spaces#
What are the libraries that can be used to publish apps with Hugging Face Spaces?
Streamlit and Gradio.
Bokeh and Plotly.
Matplotlib and Seaborn.
Answer
The correct answer is 1.
2.5 Hugging Face Pipeline for Quick Prototyping#
Choose the best option. In the transformers
library, the Pipeline
class is a _____ class.
High-level.
Low-level.
Answer
The correct answer is 1.
True or False. In the transformers
library, the Pipeline
class executes the same code for each task.
Answer
The correct answer is False. Each task has then a corresponding task-specific pipeline (e.g. “audio-classification” has a AudioClassificationPipeline, “question-answering” has a QuestionAnsweringPipeline, etc).
2.7 Evaluating a Sentiment Analysis Model#
Select the option that is not related to a sentiment analysis dataset.
IMDb
SST-2
SQuAD.
Answer
The correct answer is 3. SQuAD is a popular dataset for question answering.
What’s the name of the function from the datasets
library that allows downloading and computing metrics, such as accuracy?
accuracy_score
load_metric
evaluate
compute_metric
Answer
The correct answer is 2.
What type of data is contained in the IMDb dataset?
Tweets
General info about each movie
Movie ratings
Movie reviews
Answer
The correct answer is 4.
When should someone fine-tune a model on new data instead of using a pre-trained model directly, even if trained on similar data?
When the improvements that fine-tuning brings to your model have more benefits than the costs of building a dataset and fine-tuning the model.
When the data you have is specialized to a particular domain or task.
When the pre-trained model does not have enough capacity to capture the complexity of your data.
When the pre-trained model has not been trained on a sufficiently large dataset.
Answer
The correct answer is 1.
What’s the meaning of the device
parameter of a Hugging Face pipeline?
It specifies the size of the output data to be produced by the model.
It specifies which type of algorithm to use for natural language processing tasks.
It specifies the size of the input data to be processed by the model.
It specifies the device (e.g. CPU or GPU) to use for computations by the model.
Answer
The correct answer is 4.
2.8 Project: Detecting Emotions from Text#
What is Emotion Detection in NLP?
A technique that allows understanding the context of a text.
A way to predict the sentiment (positive or negative) of a text.
A technique that allows classifying texts with human emotions.
A tool to detect the level of understanding of a text.
Answer
The correct answer is 3.
2.9 Project: Language Detection#
What is Language Detection commonly used for?
To develop a better understanding of the structure of a language.
To detect text that has been plagiarized from another source.
Automatically translating a text from one language to another.
As a preprocessing step before passing texts to mono-lingual models.
Answer
The correct answer is 4.
How many languages are present in the Language Identification dataset?
10
20
50
200
Answer
The correct answer is 2.
What’s the gap in accuracy between deep learning models and other machine learning/statistical models for language detection?
~1%
~5%
~10%
Answer
The correct answer is 1.
2.10 Semantic Search on Big Data#
What should I typically do if the embedding model that I want to use is too slow?
Use a different smaller and faster embedding model, even if it may produce lower quality embeddings.
Increase the speed of the model by optimizing the architecture and hyperparameters.
Implement caching to enable faster embedding retrieval.
Answer
The correct answer is 1.
Why are operations like dot or cosine similarity fast on CPU?
Because of vectorization.
Because the operations are simple and easy to calculate.
Because the calculations are done without involving the memory.
Because of cache locality.
Answer
The correct answer is 1.
What are data structures that split spaces into cells to optimize computations called?
Grid-based data structures.
Spatial indexing data structures.
Space-partitioning data structures.
Answer
The correct answer is 3.
What are two popular Python libraries used to perform fast semantic search?
Pattern and Numpy.
Spacy and NLTK.
Gensim and Scikit-Learn.
Faiss and Annoy.
NLTK and TextBlob.
Answer
The correct answer is 4.
2.11 Project: Multilingual Search and Recsys over Wikipedia#
How are models able to work with text in different languages called?
Polyglot models
Machine translation
Cross-lingual models
Multilingual models
Universal language models
Answer
The correct answer is 4.
What is Apache Beam?
A cloud computing platform for big data analytics
A distributed processing system for data-parallel applications
A machine learning algorithm for text classification
A natural language processing library
An open-source SDK for defining and executing data processing workflows
Answer
The correct answer is 5.
True or False. With multilingual models, the distance between the embeddings of a sentence and its translation in another language is close to zero.
Answer
The correct answer is True.
2.12 Project: Clustering Newspaper Articles#
What text better describes the UMAP algorithm?
A non-linear dimensionality reduction algorithm based on manifold learning
An unsupervised learning algorithm for clustering data
A supervised machine learning algorithm for classification
A deep learning algorithm for image recognition
An algorithm for identifying objects in videos
Answer
The correct answer is 1.
Which of the following is not the name of a clustering algorithm?
HDBSCAN
OPTICS
KMeans
Isolation Forest
Gaussian Mixture
Answer
The correct answer is 4.
What is a quick way of assigning meaningful names to clusters?
Using keywords extracted from their texts.
Tagging the clusters with random numbers.
Identifying the most frequent words in the clusters.
Answer
The correct answer is 1. The option 3 is wrong because stopwords are very likely to be the most frequent words in each cluster.
2.13 Project: Semantic Image Search over Unsplash#
What does the acronym CLIP stand for?
Contrastive Language–Image Pre-training
Computational Linguistics Inference Pipeline
Cross-Lingual Information Processing
Contrastive Learning of Image-Text Representations
Corpus-Level Interpretation Parsing
Answer
The correct answer is 1.
What’s the goal of the training of the CLIP model?
To generate an image from a given description.
To classify images according to a given set of classes.
To generate natural language descriptions from images.
To learn the relationships between words in a sentence.
To learn the correspondences between images and their textual descriptions.
Answer
The correct answer is 5. All the other options are tasks that CLIP can be used for, but they are not the goal of its training.
True or False. Using CLIP, it’s possible to perform image search using an image as query.
Answer
The correct answer is True.
2.14 Project: Named Entity Recognition#
What is the task of Named Entity Recognition (NER) in NLP?
Generating a summary of a text.
Identifying topics in a text.
Identifying key information (entities) in text.
Extracting relationships between entities.
Answer
The correct answer is 3.
What is not a popular real-world application of Named Entity Recognition?
Extracting entities from user utterances in chatbots.
Building knowledge graphs.
Sentiment analysis.
Answer
The correct answer is 3.
Which of the following is not a typical entity extracted by a NER model?
Person entity.
Location entity.
Organization entity.
Noun entity.
Answer
The correct answer is 4.
What is the name of a popular scheme for classifying tokens in NER?
OIB scheme.
BIO scheme.
SIO scheme.
ISO scheme.
Answer
The correct answer is 2.
2.15 Knowledge Graphs#
What do nodes and edges typically represent in knowledge graphs?
Numerical Values and Connectors.
Objects and Connections.
Entities and Relations.
Concepts and Links.
Terms and Associations.
Answer
The correct answer is 3.
True or False. Knowledge graphs unfortunately can’t help in the issue of understandability in machine learning.
Answer
The correct answer is False.
True or False. The Google search engine currently leverages knowledge graphs.
Answer
The correct answer is True. Google leverages its own Google Knowledge Graph.
What does research in knowledge graphs currently focus on?
Knowledge representation learning, knowledge acquisition, and temporal knowledge graphs
Developing algorithms to automatically create and populate a knowledge graph
Improving the accuracy of knowledge graphs and temporal knowledge graphs
Answer
The correct answer is 1.
What is the scope of Knowledge Representation Learning?
It encompasses symbolic and statistical methods for learning from large amounts of data
It consists of the creation of new entities and relations in the knowledge graph
It consists of mapping entities and relations into low-dimensional vectors while capturing their semantic meanings
Answer
The correct answer is 3.
Which of the following tasks is not about Knowledge Acquisition?
Knowledge Graph Completion
Entity Recognition
Relation Extraction
Temporal Knowledge Graphs
Answer
The correct answer is 4.
True or False. Knowledge graphs can be applied to question-answering systems, recommender systems and information retrieval.
Answer
The correct answer is True.
2.16 Project: Building a Knowledge Base from Texts#
To build a knowledge graph from text, we typically need to perform two steps:
Extract entities and extract relations between these entities.
Identify topics and classify them into categories.
Tokenize the text and perform lemmatization.
Generate a list of keywords and map them to relevant concepts.
Answer
The correct answer is 1.
True or False. Relation Extraction involves doing both Named Entity Recognition and Relation Classification in an end-to-end approach.
Answer
The correct answer is True.
What data has the REBEL relation extraction model been trained on?
Unstructured text from webpages and books.
A corpus of tagged dialogue transcripts.
Entities and relations found in Wikipedia abstracts and Wikidata.
Answer
The correct answer is 3.
What’s the job of the Entity Linking task?
To link entities in a text with an existing knowledge base.
To classify entities in a text.
To assign a probability to each candidate entity.
Answer
The correct answer is 1.
What’s the name of a Python library for plotting graphs?
seaborn
matplotlib
pyvis
ggplot
Answer
The correct answer is 3.
2.17 Question Answering#
What’s the job of question answering models?
Extracting information from large unstructured datasets.
Automatically generating natural language questions related to a text.
Automatically finding answers to questions posed in natural language.
Answer
The correct answer is 3.
True or False. Many search systems augment their search results with instant answers.
Answer
The correct answer is True.
How are question answering systems usually categorized?
Extractive vs Generative, and Open vs Close.
Rule-based vs Statistical, and Syntactic vs Semantic.
Knowledge-based vs Corpus-based, and Extractive vs Generative.
Answer
The correct answer is 1.
What’s the difference between SQuAD v1 and SQuAD v2?
SQuAD v2 includes a context paragraph for each question.
SQuAD v2 contains also unanswerable questions.
SQuAD v2 includes a confidence score for each answer.
Answer
The correct answer is 2.
2.18 Text Summarization#
How are text summarization models typically categorized?
Extraction-based and Abstraction-based.
Rule-based and Statistical.
Heuristic and Neural Network.
Frequency-based and Semantic-based.
Answer
The correct answer is 1.
What’s the commonly used set of metrics for evaluating text summarization models?
Perplexity
BLEU
F-score
Word Mover’s Distance
ROUGE
Answer
The correct answer is 5.
Questions and Feedbacks#
Have questions about this lesson? Would you like to exchange ideas? Or would you like to point out something that needs to be corrected? Join the NLPlanet Discord server and interact with the community! There’s a specific channel for this course called practical-nlp-nlplanet.