Train LLaMA 2 on Hugging Face

Train LLaMA 2 on Hugging Face

The HF Team has crafted a guide to demonstrate how anyone can train LLaMA 2 on Hugging Face. This basically ability for everyone to create their own open-source version of ChatGPT without needing to code. This tutorial utilizes the LLaMA 2 base model, which will be fine-tuned for chatting using an open-source instruction dataset, and then implemented into a chat application that can be shared with friends – all achieved through simple clicks.

The relevance of this tutorial stems from the rapid ascendancy of machine learning and Large Language Models (LLMs) as integral components in both our personal and professional spheres. However, the complexity of training and deploying these models often seems insurmountable for those not versed in machine learning engineering. With the expectation that machine learning will continue to grow and that personalized models will become commonplace, the pressing question becomes: How can individuals without technical expertise take advantage of this technology on their own?

To address this, Hugging Face has been diligently developing tools that open up the world of machine learning to a broader audience. Their offerings, including Spaces, AutoTrain, and Inference Endpoints, are tailored to simplify machine learning for all users.

The tutorial encapsulates this mission by outlining a three-step process to build a chat app using Spaces, AutoTrain, and ChatUI, without writing any code. The author of this guide is not an ML engineer but part of Hugging Face’s GTM team, illustrating that if they can accomplish this, so can anyone else. Let’s explore how.

Discovering Hugging Face Spaces

Hugging Face introduces Spaces, a user-friendly platform for creating and hosting machine learning models and applications on the web. Spaces streamlines the process, offering tools such as Gradio and Streamlit for crafting ML demonstrations, the ability to upload custom applications within a docker container, and a selection of ready-to-use ML applications for swift deployment.

In our exploration, we will utilize two docker application templates from Spaces: AutoTrain and ChatUI, to showcase the ease of deploying ML applications.

Learn more about the capabilities and offerings of Spaces by Hugging Face here.

Exploring AutoTrain: A No-Code Solution

AutoTrain, designed by Hugging Face, is a revolutionary no-code solution that empowers individuals without machine learning or development experience to train cutting-edge ML models. This tool supports various domains like NLP, computer vision, speech, tabular data, and even the fine-tuning of Large Language Models—a feature we will delve into today.

Discover the full potential of AutoTrain and how it can transform your ML projects here.

Unveiling ChatUI

ChatUI is the intuitive, open-source user interface created by Hugging Face, facilitating interactions with open-source LLMs. This is the same interface that powers HuggingChat, the entirely open-source counterpart to ChatGPT, exemplifying Hugging Face’s commitment to open-source solutions.

To understand more about ChatUI and its functionalities, read further here.

Step 1: Create a new AutoTrain Space

Navigate to huggingface.co/spaces and choose the “Create new Space” option.

Name your Space and pick an appropriate usage license, particularly if you wish to share your model or Space with the public. To launch the AutoTrain application using the Docker Template in your Space, go to Docker and then select AutoTrain.

Then do the following:

  1. Choose the appropriate “Space hardware” to run your application. (For the AutoTrain app, the complimentary CPU basic option is adequate, as model training will occur on separate compute resources which you can select afterward.)
  2. Insert your “HF_TOKEN” into the “Space secrets” section to link this Space to your Hub account. Without this token, the Space cannot train or store a new model on your account. (Your HF_TOKEN is located in your Hugging Face Profile under Settings > Access Tokens; ensure the token has “Write” permissions.)
  3. Decide if you want your Space to be “Private” or “Public”. It’s advisable to keep the AutoTrain Space private initially, although you have the option to share your model or Chat App publicly later.
  4. Click on “Create Space” and there you have it! Your new Space will be ready after a short build time, then you can access the Space and begin utilizing AutoTrain.

Step 2: Launch a Model Training in AutoTrain

2.1 After your AutoTrain space is ready, you’ll be greeted with the interface shown. AutoTrain supports various training types, including fine-tuning LLMs, text classification, working with tabular data, and training diffusion models. Today, we’re concentrating on LLMs, so go ahead and click on the “LLM” tab.

2.2 In the “Model Choice” section, pick an LLM for training. You can select from a provided list or enter the name of a model from its Hugging Face model card. In our instance, we’re using Meta’s Llama 2 7b foundation model, but you can find more details on the llama 2 model card. (Keep in mind: LLama 2 is a controlled access model, requiring permission from Meta to use. However, other unrestricted models like Falcon are available for selection.)

2.3 For the “Backend,” choose the computational resources—CPU or GPU—needed for training. The “A10G Large” will suffice for a 7b model. If training a larger model, ensure it fits within the GPU’s memory. (If you require an A100 GPU for a larger model, please contact api-enterprise@huggingface.co.)

2.4 To fine-tune a model, you must upload “Training Data” in a correctly formatted CSV file. You can find the format requirements here. If your data includes multiple columns, select the “Text Column” that contains the training text. In our case, we’re using the Alpaca instruction tuning dataset, with further details available here, and it’s downloadable directly as a CSV from this link.

2.5 It’s not mandatory, but you have the option to upload “Validation Data” to evaluate your model’s performance after training.

2.6 AutoTrain offers several advanced configurations to manage your model’s memory usage. These include precision adjustments (like “FP16”), quantization options (“Int4/8”), and the use of PEFT (Parameter Efficient Fine Tuning). The default settings are usually optimal, balancing training efficiency with minimal performance impact.

2.7 You can also tweak the training parameters in the “Parameter Choice” section, but for this tutorial, we’ll proceed with the default settings.

2.8 With your configurations complete, click on “Add Job” to queue your model for training, and then hit “Start Training.” Note that if you wish to experiment with various hyperparameters, you can queue multiple jobs to run at the same time.

2.9 As the training commences, a new “Space” associated with the training process appears in your Hugging Face Hub account. Upon completion of training, the newly trained model will be listed under the “Models” section of your account. To monitor the progress of the training, you can access live logs within the Space.

2.10 Now, it’s time for a break. Depending on the complexity of your model and the volume of training data, the process may take a few hours or even days. When the training is finished, your new model will be visible in the “Models” section of your Hugging Face Hub account.

Step 3: Create a new Hugging Face ChatUI Space using your model

3.1 Repeat the steps for creating a new Space as outlined in steps 1.1 to 1.3, but this time, choose the ChatUI docker template instead of AutoTrain.

3.2 For the Space hardware, select the A10G Small option which will adequately support our 7b model. Be aware that the appropriate hardware may vary based on the specific requirements of your model’s size.

3.3 Input the details of your Mongo DB in the “MONGODB_URL” field to store chat logs if available. If you don’t have a Mongo DB, leave this field empty, and a local database will be established by default.

3.4 Provide the name of your trained model in the “MODEL_NAME” section under “Space variables”. Locate your model’s name in the “Models” section of your Hugging Face profile; it will correspond to the “Project name” designated during the AutoTrain process. For example, in our case, it’s “2legit2overfit/wrdt-pco6-31a7-0”.

3.4 You also have the option to adjust model inference settings such as temperature, top-p, and the maximum number of tokens to generate in the “Space variables”. These settings can affect the character of the generated text. We will proceed with the default values for now.

3.5 Once you’ve completed the setup, click “Create” to initiate your own version of an open-source chat application based on GPT technology. Congratulations on your accomplishment! If all steps have been followed correctly, the final outcome should resemble the provided example.

If you’re feeling inspired, but still need technical support to get started, feel free to reach out and apply for support here. Hugging Face offers a paid Expert Advice service that might be able to help.

Read related articles: