Chapter Quiz#

Here you can find all the quizzes from the lessons in this chapter. Try to solve them to understand how well you have learned the concepts.

2.1 First Steps with Hugging Face#

What’s the Hugging Face library that provides APIs and easy-to-use functions to utilize transformers or download pre-trained models?

  1. transformers.

  2. tokenizers.

  3. accelerate.

Complete the sentence with the correct option. Hugging Face provides a _____ interface for working with NLP models than PyTorch and TensorFlow.

  1. lower-level.

  2. higher-level.

True or False. All the models available on the Hugging Face Hub have been trained by Hugging Face.

True or False. On the Hugging Face Hub you can find models for NLP tasks only.

True or False. Hugging Face Spaces lets you deploy and host your ML demos on Hugging Face, but you need a premium account.

2.2 Hugging Face Hub: Models and Model Cards#

Select the option that describes something not typically found on a model card.

  1. The evaluation results.

  2. The model intended uses and potential limitations, including biases and ethical considerations.

  3. Training datasets.

  4. The model price.

  5. The model architecture (e.g. BERT, RoBERTa, etc).

  6. The training configuration and experimental info.

True or False. All the models in the Hugging Face Hub can be used commercially.

True or False. If you have a question about a model on the Hugging Face Hub, you should send a message to its author.

True or False. On the Hugging Face Hub you can often test models directly from the browser.

2.3 Hugging Face Hub: Datasets#

Choose the incorrect option. On the Hugging Face Hub, datasets can be filtered by:

  1. License.

  2. Languages of the samples contained.

  3. Number of models that have been trained on the dataset.

  4. Multilinguality, i.e. whether the dataset can be used for multilingual model or not.

  5. Size of the dataset (i.e. the number of samples contained).

  6. Tasks for which the dataset has been created.

True or False. Is it possible to see some samples directly from the dataset page?

What is a dataset card and what does it typically contain?

  1. A document that lists useful information about a dataset, like its creation process or how to responsibly use the data.

  2. A document that lists useful information about a dataset, like the best models trained with it and their scores.

  3. A document that lists useful information about a dataset, like its authors and other similar datasets.

2.4 Hugging Face Spaces#

What are the libraries that can be used to publish apps with Hugging Face Spaces?

  1. Streamlit and Gradio.

  2. Bokeh and Plotly.

  3. Matplotlib and Seaborn.

2.5 Hugging Face Pipeline for Quick Prototyping#

Choose the best option. In the transformers library, the Pipeline class is a _____ class.

  1. High-level.

  2. Low-level.

True or False. In the transformers library, the Pipeline class executes the same code for each task.

2.7 Evaluating a Sentiment Analysis Model#

Select the option that is not related to a sentiment analysis dataset.

  1. IMDb

  2. SST-2

  3. SQuAD.

What’s the name of the function from the datasets library that allows downloading and computing metrics, such as accuracy?

  1. accuracy_score

  2. load_metric

  3. evaluate

  4. compute_metric

What type of data is contained in the IMDb dataset?

  1. Tweets

  2. General info about each movie

  3. Movie ratings

  4. Movie reviews

When should someone fine-tune a model on new data instead of using a pre-trained model directly, even if trained on similar data?

  1. When the improvements that fine-tuning brings to your model have more benefits than the costs of building a dataset and fine-tuning the model.

  2. When the data you have is specialized to a particular domain or task.

  3. When the pre-trained model does not have enough capacity to capture the complexity of your data.

  4. When the pre-trained model has not been trained on a sufficiently large dataset.

What’s the meaning of the device parameter of a Hugging Face pipeline?

  1. It specifies the size of the output data to be produced by the model.

  2. It specifies which type of algorithm to use for natural language processing tasks.

  3. It specifies the size of the input data to be processed by the model.

  4. It specifies the device (e.g. CPU or GPU) to use for computations by the model.

2.8 Project: Detecting Emotions from Text#

What is Emotion Detection in NLP?

  1. A technique that allows understanding the context of a text.

  2. A way to predict the sentiment (positive or negative) of a text.

  3. A technique that allows classifying texts with human emotions.

  4. A tool to detect the level of understanding of a text.

2.9 Project: Language Detection#

What is Language Detection commonly used for?

  1. To develop a better understanding of the structure of a language.

  2. To detect text that has been plagiarized from another source.

  3. Automatically translating a text from one language to another.

  4. As a preprocessing step before passing texts to mono-lingual models.

How many languages are present in the Language Identification dataset?

  1. 10

  2. 20

  3. 50

  4. 200

What’s the gap in accuracy between deep learning models and other machine learning/statistical models for language detection?

  1. ~1%

  2. ~5%

  3. ~10%

2.10 Semantic Search on Big Data#

What should I typically do if the embedding model that I want to use is too slow?

  1. Use a different smaller and faster embedding model, even if it may produce lower quality embeddings.

  2. Increase the speed of the model by optimizing the architecture and hyperparameters.

  3. Implement caching to enable faster embedding retrieval.

Why are operations like dot or cosine similarity fast on CPU?

  1. Because of vectorization.

  2. Because the operations are simple and easy to calculate.

  3. Because the calculations are done without involving the memory.

  4. Because of cache locality.

What are data structures that split spaces into cells to optimize computations called?

  1. Grid-based data structures.

  2. Spatial indexing data structures.

  3. Space-partitioning data structures.

What are two popular Python libraries used to perform fast semantic search?

  1. Pattern and Numpy.

  2. Spacy and NLTK.

  3. Gensim and Scikit-Learn.

  4. Faiss and Annoy.

  5. NLTK and TextBlob.

2.11 Project: Multilingual Search and Recsys over Wikipedia#

How are models able to work with text in different languages called?

  1. Polyglot models

  2. Machine translation

  3. Cross-lingual models

  4. Multilingual models

  5. Universal language models

What is Apache Beam?

  1. A cloud computing platform for big data analytics

  2. A distributed processing system for data-parallel applications

  3. A machine learning algorithm for text classification

  4. A natural language processing library

  5. An open-source SDK for defining and executing data processing workflows

True or False. With multilingual models, the distance between the embeddings of a sentence and its translation in another language is close to zero.

2.12 Project: Clustering Newspaper Articles#

What text better describes the UMAP algorithm?

  1. A non-linear dimensionality reduction algorithm based on manifold learning

  2. An unsupervised learning algorithm for clustering data

  3. A supervised machine learning algorithm for classification

  4. A deep learning algorithm for image recognition

  5. An algorithm for identifying objects in videos

Which of the following is not the name of a clustering algorithm?

  1. HDBSCAN

  2. OPTICS

  3. KMeans

  4. Isolation Forest

  5. Gaussian Mixture

What is a quick way of assigning meaningful names to clusters?

  1. Using keywords extracted from their texts.

  2. Tagging the clusters with random numbers.

  3. Identifying the most frequent words in the clusters.

2.13 Project: Semantic Image Search over Unsplash#

What does the acronym CLIP stand for?

  1. Contrastive Language–Image Pre-training

  2. Computational Linguistics Inference Pipeline

  3. Cross-Lingual Information Processing

  4. Contrastive Learning of Image-Text Representations

  5. Corpus-Level Interpretation Parsing

What’s the goal of the training of the CLIP model?

  1. To generate an image from a given description.

  2. To classify images according to a given set of classes.

  3. To generate natural language descriptions from images.

  4. To learn the relationships between words in a sentence.

  5. To learn the correspondences between images and their textual descriptions.

True or False. Using CLIP, it’s possible to perform image search using an image as query.

2.14 Project: Named Entity Recognition#

What is the task of Named Entity Recognition (NER) in NLP?

  1. Generating a summary of a text.

  2. Identifying topics in a text.

  3. Identifying key information (entities) in text.

  4. Extracting relationships between entities.

What is not a popular real-world application of Named Entity Recognition?

  1. Extracting entities from user utterances in chatbots.

  2. Building knowledge graphs.

  3. Sentiment analysis.

Which of the following is not a typical entity extracted by a NER model?

  1. Person entity.

  2. Location entity.

  3. Organization entity.

  4. Noun entity.

What is the name of a popular scheme for classifying tokens in NER?

  1. OIB scheme.

  2. BIO scheme.

  3. SIO scheme.

  4. ISO scheme.

2.15 Knowledge Graphs#

What do nodes and edges typically represent in knowledge graphs?

  1. Numerical Values and Connectors.

  2. Objects and Connections.

  3. Entities and Relations.

  4. Concepts and Links.

  5. Terms and Associations.

True or False. Knowledge graphs unfortunately can’t help in the issue of understandability in machine learning.

True or False. The Google search engine currently leverages knowledge graphs.

What does research in knowledge graphs currently focus on?

  1. Knowledge representation learning, knowledge acquisition, and temporal knowledge graphs

  2. Developing algorithms to automatically create and populate a knowledge graph

  3. Improving the accuracy of knowledge graphs and temporal knowledge graphs

What is the scope of Knowledge Representation Learning?

  1. It encompasses symbolic and statistical methods for learning from large amounts of data

  2. It consists of the creation of new entities and relations in the knowledge graph

  3. It consists of mapping entities and relations into low-dimensional vectors while capturing their semantic meanings

Which of the following tasks is not about Knowledge Acquisition?

  1. Knowledge Graph Completion

  2. Entity Recognition

  3. Relation Extraction

  4. Temporal Knowledge Graphs

True or False. Knowledge graphs can be applied to question-answering systems, recommender systems and information retrieval.

2.16 Project: Building a Knowledge Base from Texts#

To build a knowledge graph from text, we typically need to perform two steps:

  1. Extract entities and extract relations between these entities.

  2. Identify topics and classify them into categories.

  3. Tokenize the text and perform lemmatization.

  4. Generate a list of keywords and map them to relevant concepts.

True or False. Relation Extraction involves doing both Named Entity Recognition and Relation Classification in an end-to-end approach.

What data has the REBEL relation extraction model been trained on?

  1. Unstructured text from webpages and books.

  2. A corpus of tagged dialogue transcripts.

  3. Entities and relations found in Wikipedia abstracts and Wikidata.

What’s the job of the Entity Linking task?

  1. To link entities in a text with an existing knowledge base.

  2. To classify entities in a text.

  3. To assign a probability to each candidate entity.

What’s the name of a Python library for plotting graphs?

  1. seaborn

  2. matplotlib

  3. pyvis

  4. ggplot

2.17 Question Answering#

What’s the job of question answering models?

  1. Extracting information from large unstructured datasets.

  2. Automatically generating natural language questions related to a text.

  3. Automatically finding answers to questions posed in natural language.

True or False. Many search systems augment their search results with instant answers.

How are question answering systems usually categorized?

  1. Extractive vs Generative, and Open vs Close.

  2. Rule-based vs Statistical, and Syntactic vs Semantic.

  3. Knowledge-based vs Corpus-based, and Extractive vs Generative.

What’s the difference between SQuAD v1 and SQuAD v2?

  1. SQuAD v2 includes a context paragraph for each question.

  2. SQuAD v2 contains also unanswerable questions.

  3. SQuAD v2 includes a confidence score for each answer.

2.18 Text Summarization#

How are text summarization models typically categorized?

  1. Extraction-based and Abstraction-based.

  2. Rule-based and Statistical.

  3. Heuristic and Neural Network.

  4. Frequency-based and Semantic-based.

What’s the commonly used set of metrics for evaluating text summarization models?

  1. Perplexity

  2. BLEU

  3. F-score

  4. Word Mover’s Distance

  5. ROUGE

Questions and Feedbacks#

Have questions about this lesson? Would you like to exchange ideas? Or would you like to point out something that needs to be corrected? Join the NLPlanet Discord server and interact with the community! There’s a specific channel for this course called practical-nlp-nlplanet.