Intro to HuggingFace Transformers

Welcome to “A Total Noob’s Introduction to Hugging Face Transformers,” a guide crafted specifically for beginners eager to grasp the fundamentals of using open-source machine learning. Our aim is to demystify Hugging Face Transformers and its functionalities, enabling you to better understand and collaborate with machine learning practitioners. We won’t turn you into an ML expert, but we’ll provide you with a foundational understanding through a simple, practical example of running Microsoft’s Phi-2 LLM in a notebook on a Hugging Face space.

You might wonder why we need another Hugging Face tutorial when there are plenty available. The reason is accessibility: most tutorials assume some technical background, such as proficiency in Python, which can be a barrier for non-technical individuals. Coming from the business side of AI, I recognize this challenge and want to offer a more approachable path for similar learners.

This guide is tailored for a non-technical audience who want to understand open-source machine learning without having to learn Python from scratch. We assume no prior knowledge and will explain concepts from the ground up to ensure clarity. Engineers might find this guide basic, but for beginners, it’s an ideal starting point.

Let’s dive in… but first, some context.

What is Hugging Face Transformers?

Hugging Face Transformers is an open-source Python library that provides access to thousands of pre-trained Transformer models for natural language processing (NLP), computer vision, audio tasks, and more. It simplifies the implementation of Transformer models by abstracting the complexity of training or deploying models in lower-level ML frameworks like PyTorch, TensorFlow, and JAX.

What is a Library?

A library is a collection of reusable code that can be integrated into projects to implement functionality more efficiently without writing code from scratch. The Transformers library provides reusable code for implementing models in common frameworks like PyTorch, TensorFlow, and JAX. This reusable code can be accessed by calling functions (also known as methods) within the library.

What is the Hugging Face Hub?

The Hugging Face Hub is a collaboration platform that hosts a vast collection of open-source models and datasets for machine learning. Think of it as the GitHub for ML. The hub facilitates sharing and collaboration by making it easy to discover, learn, and interact with valuable ML assets from the open-source community. It integrates with the Transformers library, as models deployed using the library are downloaded from the hub.

What are Hugging Face Spaces?

Hugging Face Spaces is a service available on the Hugging Face Hub that provides an easy-to-use GUI for building and deploying web-hosted ML demos and apps. The service allows you to quickly build ML demos, upload your own apps for hosting, or select pre-configured ML applications to deploy instantly.

In this tutorial, we’ll deploy one of the pre-configured ML applications, a JupyterLab notebook, by selecting the corresponding Docker container.

What is a Notebook?

Notebooks are interactive applications that allow you to write and share live executable code interwoven with narrative text. They are especially useful for Data Scientists and Machine Learning Engineers as they allow for real-time experimentation with code and make it easy to review and share results.

Create a Hugging Face Account

Go to hf.co and click “Sign Up” to create an account if you don’t already have one.

Add Your Billing Information

Within your HF account, go to Settings > Billing and add your credit card to the payment information section.

Why Do We Need Your Credit Card?

Running most LLMs requires a GPU, which isn’t free. However, you can rent GPUs from Hugging Face. The GPU needed for this tutorial, an NVIDIA A10G, only costs a couple of dollars per hour.

Create a Space to Host Your Notebook

On hf.co, go to Spaces > Create New.

Configure Your Space

Set your preferred space name.
Select Docker > JupyterLab to choose the pre-configured notebook app.
Select Space Hardware as “Nvidia A10G Small.”
Leave everything else as default and select “Create Space.”

What is a Docker Template?

A Docker template is a predefined blueprint for a software environment that includes necessary software and configurations, enabling developers to deploy applications consistently and in isolation easily.

Why Do I Need to Select GPU Space Hardware?

While a complimentary CPU is available by default, the many computations required by LLMs benefit significantly from parallel processing, which GPUs excel at. Additionally, the A10G Small GPU with 24GB memory is sufficient for running the Phi-2 model.

Login to JupyterLab

After the Space finishes building, you will see a login screen. If you left the token as default in the template, you can log in with “huggingface.” Otherwise, use the token you set.

Create a New Notebook

Within the “Launcher” tab, select the top “Python 3” square under the “Notebook” heading to create a new notebook environment with Python pre-installed.

Install Required Packages

In your new notebook, install the PyTorch and Transformers libraries by entering the following commands and executing them:

!pip install torch
!pip install transformers

What is !pip install?

!pip is a command used to install Python packages from the Python Package Index (PyPI), allowing you to extend Python applications with a wide range of third-party add-ons.

If We Are Using Transformers, Why Do We Need PyTorch Too?

Hugging Face is built on top of other frameworks like PyTorch, TensorFlow, and JAX. In this case, we are using Transformers with PyTorch, so we need to install it to access its functionality.

Import the AutoTokenizer and AutoModelForCausalLM Classes from Transformers

Enter the following code on a new line and run it:

from transformers import AutoTokenizer, AutoModelForCausalLM

What is a Class?

Classes are code recipes for creating objects. They are useful because they allow us to save objects with a combination of properties and functions, simplifying coding by making all the information and operations needed for specific tasks accessible in one place. We’ll use these classes to create a model and a tokenizer object.

Why Do I Need to Import the Class Again After Installing Transformers?

Although Transformers is installed, the specific classes within Transformers are not automatically available for use. Python requires us to explicitly import individual classes to avoid naming conflicts and ensure only necessary parts of a library are loaded.

Define Which Model You Want to Run

To specify the model to download and run from the Hugging Face Hub, set a variable equal to the model name:

model_id = "microsoft/phi-2"

What is an Instruction-Tuned Model?

An instruction-tuned language model is trained to understand and respond to user commands or prompts, improving its ability to follow instructions. Base models can autocomplete text but often don’t respond to commands effectively.

Create a Model Object and Load the Model

To load the model from the Hugging Face Hub into your local environment, instantiate the model object by passing the model_id to the .from_pretrained method of the AutoModelForCausalLM class. This process may take a few minutes to download the model.

model = AutoModelForCausalLM.from_pretrained(model_id)

What is an Argument?

An argument is input information passed to a function to compute an output. We pass an argument into a function by placing it between the function brackets.

What is a Method?

A method is a function that uses information from a particular object or class. In this case, the .from_pretrained method uses the class information and the model_id to create a new model object.

Create a Tokenizer Object and Load the Tokenizer

To load the tokenizer, create a tokenizer object by passing the model_id as an argument to the .from_pretrained method of the AutoTokenizer class.

tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True, padding_side='left')

What is a Tokenizer?

A tokenizer splits sentences into smaller pieces of text (tokens) and assigns each token a numeric value called an input ID. This is necessary because the model only understands numbers, so the text must be encoded into a format the model can understand. Each model has its own tokenizer vocabulary, so it’s essential to use the tokenizer the model was trained on.

Create the Inputs for the Model to Process

Define a new variable input_text with the prompt you want to give the model. Pass this variable to the tokenizer object to create the input_ids and add a second argument return_tensors="pt" to ensure the token ID is represented as the correct kind of vector for PyTorch.

input_text = "Who are you?"
input_ids = tokenizer(input_text, return_tensors="pt")

Run Generation and Decode the Output

Pass the input in the correct format into the model by calling the .generate method.

What Next?

Now that you’ve successfully run inference on your first LLM, here are a few additional steps and concepts to further enhance your understanding and capabilities with Hugging Face Transformers.

Exploring Different Models

The Hugging Face Hub hosts a wide variety of models for different tasks. You can explore models tailored for tasks such as text classification, translation, summarization, and more. To use a different model, simply change the model_id variable to the ID of the desired model on the Hugging Face Hub.

model_id = "distilgpt2"  # Example for a different model
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

Fine-Tuning a Model

Fine-tuning a pre-trained model involves training it further on a specific dataset to specialize it for a particular task. This requires a bit more setup, including preparing a dataset and configuring training parameters. Hugging Face provides extensive documentation and tutorials on fine-tuning models.

Using Pipelines

Hugging Face provides an easy-to-use interface called pipeline which abstracts away many complexities. You can use pipelines for common tasks like text generation, sentiment analysis, and more.

from transformers import pipeline

generator = pipeline('text-generation', model=model_id)
result = generator("Once upon a time,")
print(result)

Saving and Loading Models

To save a model and tokenizer locally, you can use the .save_pretrained method, and to load them again, use .from_pretrained.

# Save the model and tokenizer
model.save_pretrained("path_to_save")
tokenizer.save_pretrained("path_to_save")

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("path_to_save")
tokenizer = AutoTokenizer.from_pretrained("path_to_save")

Understanding the Output

The output from a language model can sometimes be unpredictable. Here are a few techniques to manage and interpret the results better:

Adjusting Generation Parameters: Parameters like max_new_tokens, temperature, top_k, and top_p can control the diversity and coherence of the generated text.
Post-Processing: Implement additional processing steps to clean or structure the output as needed for your application.
Error Handling: Always include error handling to manage cases where the model produces unexpected or undesired results.

Community and Resources

Engage with the Hugging Face community for support and further learning. Here are a few resources:

Hugging Face Forums: A great place to ask questions and share knowledge.
Hugging Face Documentation: Comprehensive guides and API references.
Courses and Tutorials: Hugging Face offers various tutorials and courses to help you advance your skills.

Conclusion

You’ve taken your first steps into the world of Hugging Face Transformers and machine learning. By understanding the basics, experimenting with different models, and utilizing the extensive resources available, you can continue to grow your knowledge and skills. Whether you’re a beginner looking to get a handle on ML concepts or aiming to become more proficient, the journey has just begun.

Feel free to revisit this guide, explore the Hugging Face Hub, and keep experimenting with different models and tasks. Happy coding!

This guide should provide you with a solid foundation to start exploring Hugging Face Transformers and machine learning. If you have any more questions or need further assistance.

Read other articles: