Hugging Face and Pre-trained Models#

First Steps with Hugging Face#

What’s the Hugging Face library that provides APIs and easy-to-use functions to utilize transformers or download pre-trained models?

  1. transformers.

  2. tokenizers.

  3. accelerate.

Complete the sentence with the correct option. Hugging Face provides a _____ interface for working with NLP models than PyTorch and TensorFlow.

  1. lower-level.

  2. higher-level.

True or False. All the models available on the Hugging Face Hub have been trained by Hugging Face.

True or False. On the Hugging Face Hub you can find models for NLP tasks only.

True or False. Hugging Face Spaces lets you deploy and host your ML demos on Hugging Face, but you need a premium account.

Hugging Face Hub: Models and Model Cards#

Select the option that describes something not typically found on a model card.

  1. The evaluation results.

  2. The model intended uses and potential limitations, including biases and ethical considerations.

  3. Training datasets.

  4. The model price.

  5. The model architecture (e.g. BERT, RoBERTa, etc).

  6. The training configuration and experimental info.

True or False. All the models in the Hugging Face Hub can be used commercially.

True or False. If you have a question about a model on the Hugging Face Hub, you should send a message to its author.

True or False. On the Hugging Face Hub you can often test models directly from the browser.

Hugging Face Hub: Datasets#

Choose the incorrect option. On the Hugging Face Hub, datasets can be filtered by:

  1. License.

  2. Languages of the samples contained.

  3. Number of models that have been trained on the dataset.

  4. Multilinguality, i.e. whether the dataset can be used for multilingual model or not.

  5. Size of the dataset (i.e. the number of samples contained).

  6. Tasks for which the dataset has been created.

True or False. Is it possible to see some samples directly from the dataset page?

What is a dataset card and what does it typically contain?

  1. A document that lists useful information about a dataset, like its creation process or how to responsibly use the data.

  2. A document that lists useful information about a dataset, like the best models trained with it and their scores.

  3. A document that lists useful information about a dataset, like its authors and other similar datasets.

Hugging Face Spaces#

What are the libraries that can be used to publish apps with Hugging Face Spaces?

  1. Streamlit and Gradio.

  2. Bokeh and Plotly.

  3. Matplotlib and Seaborn.

Hugging Face Pipeline for Quick Prototyping#

Choose the best option. In the transformers library, the Pipeline class is a _____ class.

  1. High-level.

  2. Low-level.

True or False. In the transformers library, the Pipeline class executes the same code for each task.

Evaluating a Sentiment Analysis Model#

Select the option that is not related to a sentiment analysis dataset.

  1. IMDb

  2. SST-2

  3. SQuAD.

What’s the name of the function from the datasets library that allows downloading and computing metrics, such as accuracy?

  1. accuracy_score

  2. load_metric

  3. evaluate

  4. compute_metric

What type of data is contained in the IMDb dataset?

  1. Tweets

  2. General info about each movie

  3. Movie ratings

  4. Movie reviews

When should someone fine-tune a model on new data instead of using a pre-trained model directly, even if trained on similar data?

  1. When the improvements that fine-tuning brings to your model have more benefits than the costs of building a dataset and fine-tuning the model.

  2. When the data you have is specialized to a particular domain or task.

  3. When the pre-trained model does not have enough capacity to capture the complexity of your data.

  4. When the pre-trained model has not been trained on a sufficiently large dataset.

What’s the meaning of the device parameter of a Hugging Face pipeline?

  1. It specifies the size of the output data to be produced by the model.

  2. It specifies which type of algorithm to use for natural language processing tasks.

  3. It specifies the size of the input data to be processed by the model.

  4. It specifies the device (e.g. CPU or GPU) to use for computations by the model.

Project: Detecting Emotions from Text#

What is Emotion Detection in NLP?

  1. A technique that allows understanding the context of a text.

  2. A way to predict the sentiment (positive or negative) of a text.

  3. A technique that allows classifying texts with human emotions.

  4. A tool to detect the level of understanding of a text.

Project: Language Detection#

What is Language Detection commonly used for?

  1. To develop a better understanding of the structure of a language.

  2. To detect text that has been plagiarized from another source.

  3. Automatically translating a text from one language to another.

  4. As a preprocessing step before passing texts to mono-lingual models.

How many languages are present in the Language Identification dataset?

  1. 10

  2. 20

  3. 50

  4. 200

Semantic Search on Big Data#

What should I typically do if the embedding model that I want to use is too slow?

  1. Use a different smaller and faster embedding model, even if it may produce lower quality embeddings.

  2. Increase the speed of the model by optimizing the architecture and hyperparameters.

  3. Implement caching to enable faster embedding retrieval.

Why are operations like dot or cosine similarity fast on CPU?

  1. Because of vectorization.

  2. Because the operations are simple and easy to calculate.

  3. Because the calculations are done without involving the memory.

  4. Because of cache locality.

What are data structures that split spaces into cells to optimize computations called?

  1. Grid-based data structures.

  2. Spatial indexing data structures.

  3. Space-partitioning data structures.

What are two popular Python libraries used to perform fast semantic search?

  1. Pattern and Numpy.

  2. Spacy and NLTK.

  3. Gensim and Scikit-Learn.

  4. Faiss and Annoy.

  5. NLTK and TextBlob.