HuggingFace Agents

Large Language Models (LLMs) trained for causal language modeling are versatile and can handle a broad spectrum of tasks. However, they often falter with simpler tasks such as logic, calculation, and search. When these models are used in areas where they are less effective, the results may not meet expectations.

To mitigate these limitations, the concept of an “agent” has been developed.

An agent utilizes an LLM as its core processing unit and is equipped with specialized functions known as tools. These tools enable the agent to perform specific tasks effectively, providing all necessary information for the agent to operate them accurately.

An agent can be programmed in various ways:

It can execute a sequence of actions or tools simultaneously, similar to the CodeAgent.
Alternatively, it can execute actions or tools sequentially, assessing the results of each before proceeding to the next, akin to the ReactJsonAgent.

Types of HuggingFace Agents

There are different types of agents:

Code Agent

This agent plans its actions and then executes all Python code at once. It is adept at managing various input and output types for its tools, making it ideal for multimodal tasks.

React Agents

These agents are optimal for reasoning tasks. Utilizing the ReAct framework (Yao et al., 2022), they are highly effective in processing based on prior observations.

There are two variants of ReactJsonAgent:

ReactJsonAgent, which outputs tool calls in JSON format.
ReactCodeAgent, a newer version that generates its tool calls as code segments, enhancing performance for LLMs with robust coding capabilities.

Building an agent involves several key components and a step-by-step setup process. Here’s a simplified guide to help you get started:

Initialize Your Agent:

LLM as an Engine: The agent utilizes a Large Language Model (LLM) as its core engine. This LLM is not the agent itself but powers the agent.
System Prompt: Define what you will prompt the LLM with to generate outputs.
Toolbox: Equip your agent with a set of tools it can use to execute tasks.
Parser: This component will interpret the LLM’s output to decide which tools to call and with which arguments.
Tool Descriptions: Initialize the agent with descriptions of each tool, which are then integrated into the system prompt.

Install Required Packages:

Install necessary libraries and dependencies using the command:
bash pip install transformers[agents]

Build the LLM Engine:

Set up a method, such as llm_engine, that takes a list of messages and returns text. This function should stop generating output based on specific sequences. from huggingface_hub import login, InferenceClient login("<YOUR_HUGGINGFACEHUB_API_TOKEN>") client = InferenceClient(model="meta-llama/Meta-Llama-3-70B-Instruct") def llm_engine(messages, stop_sequences=["Task"]) -> str: response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000) return response.choices[0].message.content

Set Up Tools:

You can start with an empty list of tools or use a default toolbox. If you opt for the default, include the add_base_tools=True argument to automatically include basic tools.

Create and Run an Agent:

Instantiate an agent, such as CodeAgent, and run it with specific tasks. Here’s how you might set it up: from transformers import CodeAgent, HfEngine llm_engine = HfEngine(model="meta-llama/Meta-Llama-3-70B-Instruct") agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True) agent.run( "Could you translate this sentence from French, say it out loud and return the audio.", sentence="Où est la boulangerie la plus proche?" )

Code Execution and Safety:

The Python interpreter executes the code with restrictions to prevent unauthorized operations. Only tools you provide and basic functions like print are executable.

Customize System Prompts:

Customize the system prompt for specific tasks to guide the LLM in generating relevant outputs. For complex or recurring tasks, you might need to fine-tune this prompt to get optimal results.

By following these steps, you can effectively build and utilize an agent powered by an LLM for a variety of tasks. Adjust and extend the functionalities as needed based on the specific requirements of your tasks.

Tools are specific functions that an agent can utilize to perform tasks. Let’s explore how you can manage and extend the functionality of an agent with tools, including creating and loading custom tools.

Understanding Tools

A tool is an atomic function that an agent can execute. Each tool has:

Name: Identifies the tool.
Description: Explains what the tool does.
Input Descriptions: Details the types and descriptions of inputs the tool accepts.
Output Type: Specifies the type of output the tool produces.
__call__ Method: Executes the specific function of the tool.

When initializing an agent, tool attributes are incorporated into the agent’s system prompt, informing the agent about available tools and their functions.

Default Toolbox

Transformers library provides a default toolbox, which can be added to your agent using add_base_tools=True. This includes tools like:

Document Question Answering: Answers questions from a document image.
Image Question Answering: Answers questions based on an image.
Speech to Text: Transcribes spoken audio.
Text to Speech: Converts text into speech.
Translation: Translates text from one language to another.
Python Code Interpreter: Executes Python code in a secure environment.

Using a Tool

To manually use a tool, you can load it with the load_tool() function:

from transformers import load_tool

tool = load_tool("text-to-speech")
audio = tool("This is a text to speech tool")

Creating a New Tool

You can create custom tools for specific use cases. Here’s an example of creating a tool that fetches the most downloaded model for a given task from the Hugging Face Hub:

from transformers import Tool
from huggingface_hub import list_models

class HFModelDownloadsTool(Tool):
    name = "model_download_counter"
    description = "Returns the most downloaded model of a given task on the Hugging Face Hub."
    inputs = {
        "task": {
            "type": "text",
            "description": "The task category (e.g., text-classification)."
        }
    }
    output_type = "text"

    def forward(self, task: str):
        model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
        return model.id

Integrating Custom Tools

Once your custom tool is ready, you can integrate it into an agent. Here’s how to do it:

from transformers import load_tool, CodeAgent

# Assuming the tool is published on the Hugging Face Hub under 'm-ric/hf-model-downloads'
model_download_tool = load_tool("m-ric/hf-model-downloads")
agent = CodeAgent(tools=[model_download_tool], llm_engine=llm_engine)
agent.run(
    "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?"
)

Managing an Agent’s Toolbox

To add a new tool to an existing agent:

from transformers import CodeAgent

agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)
agent.toolbox.add_tool(model_download_tool)

# Now you can use both the new tool and any existing ones
agent.run(
    "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub and return the audio?"
)

This setup allows you to extend the capabilities of your agent dynamically, adding tools as needed without reinitializing the entire agent system.

To leverage the LangChain tools within a Transformers agent setup, specifically using tools like a web search tool from LangChain, you can follow these steps. This setup involves importing the tool using LangChain’s from_langchain() method and integrating it into a Transformers-based agent.

Importing and Using LangChain Tools

Import Required Libraries: Ensure you have the necessary libraries installed and then import them into your script.

   from langchain.agents import load_tools
   from transformers import Tool, ReactCodeAgent

Load LangChain Tool: Load the specific LangChain tool you intend to use. In this example, we’re using a tool named serpapi for web searching.

   search_tool = Tool.from_langchain(load_tools(["serpapi"])[0])

Create an Agent: Integrate the loaded LangChain tool into a ReactCodeAgent. This agent can now utilize the advanced search capabilities provided by the LangChain tool.

   agent = ReactCodeAgent(tools=[search_tool])

Execute a Query: Use the agent to run a query. Here, you’re asking about the difference in the number of layers between BERT base encoder and the encoder from the seminal paper “Attention is All You Need.”

   query = "How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?"
   result = agent.run(query)
   print(result)

How It Works

Tool Integration: The from_langchain() method facilitates the integration of LangChain tools into the Transformers ecosystem, allowing the use of specialized tools like serpapi for specific tasks such as web searching.
Agent Operation: The ReactCodeAgent uses the tool to execute queries and interpret responses based on the tool’s capabilities. This setup is particularly useful for tasks that require external data or knowledge not contained within the model itself.

By following these steps, you can effectively use LangChain tools within a Transformers agent, enhancing the agent’s ability to handle complex queries with external tools. This approach is highly flexible, allowing the integration of multiple tools for diverse tasks.

Read other articles: