Using OpenAI Functions with Langchain Agents

About the author

My name is Ilya Fastovets, and I am a data scientist. My primary area of expertise is machine learning tools for agriculture. I find this field particularly exciting because it helps to optimize the production of food which directly affects nature and people’s lives. What makes it even more interesting is that this field combines some other fields of science, such as Biology, Chemistry, Physics, and Soil Science. After GPT-3 and GPT-4 models were released, I became particularly interested in their decision-making capabilities, and how they can be applied to solve real-world problems. We noticed that the sales leaders have a special interest in understanding their company’s data using natural language. This is how Datalynx was started. There, I am working on exploring the decision-making capabilities of LLMs to come up with a solution to this problem.

Introduction

With the release of GPT-3 Large Language Model (LLM), OpenAI has revolutionized the machine learning space. Many people have found it useful for solving tasks related to text, such as composing emails, writing reports, and code generation. However, there are more hidden capabilities of LLMs that many people overlook. In particular, the reasoning and decision-making chats that can ‘talk to themselves’ to break down complex problems into simple ones, trigger actions, and analyze the output while achieving the desirable results. These tasks may not even be related to text. An example could be a smart home. First, the voice recognition system accepts your voice command (e.g., ‘make it more cosy’). The command is converted into text by a speech-to-text model. Then, an LLM is used to get the temperature and lighting conditions from the sensors, analyze the output from them, and take action to make the atmosphere in the house more cosy. In this example, the task is not related to text, yet an LLM is used in the backend for decision-making. But how can this be achieved? This is where Langchain comes into play, with its powerful Agent capabilities. In this short article, we will create a simple workable example to demonstrate how it works.

What is Langchain?

According to the official documentation, ‘Langchain is a framework for developing applications powered by language models’. This is a higher-level framework that is often used instead of or in addition to the official OpenAI API for Python, to utilize the full capabilities of GPT-4 LLM. To achieve this, several new high-level concepts were introduced, such as Chains, Agents, and Memory. In this article, I will focus on the Agents and specifically on how they can be used to work with OpenAI function calls for decision-making.

What is a function call in OpenAI?

Apart from raw text generation, newer OpenAI chat completion models are capable of generating the output in a structured format. Given a detailed description of a function call in the input prompt, it can automatically decide whether to generate a ‘free’ text output or to ‘call a function’. In case the model decides to call a function, a JSON object is returned in the output. It contains the name of the function, and the parameters to call the function with. Multiple functions can be used, with the model being capable of deciding what function to use and when. This capability is extended in the idea of Agents in Langchain.

What is a Langchain Agent?

Langchain Agent is a reasoning engine that can choose a sequence of actions to take. It is a flexible concept and can be used for solving various problems. It is easy to think about it as if a model chat chats with itself to solve a particular problem. It is a general concept suitable for various LLMs, but in the case of OpenAI models, it is capable of doing so by utilizing OpenAI function calls. To achieve this, a special parser is designed. A parser is a thing that analyzes the outputs of the agent and decides whether the agent is taking the next action (AgentAction object) or is finished (AgentFinish object). The output of the agent is returned to the chat when the AgentFinish object is returned by the parser. Another concept that needs to be mentioned in this regard is a Tool. In our case, a Langchain Tool is just another representation of an OpenAI function call that can be utilized by an Agent. Now, let us use a simple example to demonstrate how to define Tools and an Agent, and how to execute the Agent to solve a particular problem with these tools using the parser for OpenAI function calls.

Example

In the example below, I will provide a demonstration of how an agent can be created in Langchain. In this simple example, I will use two dummy methods. The first method will retrieve the full name of an imaginary customer from their first name by simply attaching the last name 'Smith' to it. The second method will use the full name retrieved by the first method to get the email of the customer. This will be done by simply attaching '@gmail.com' to it. Then I will ask the question 'What is the full name and email of our customer John?'. The agent should call the first method with the first name to retrieve the full name, and then sequentially call the second method with the output of the first method to get the email of the customer. Finally, the agent should combine this information in an answer, and promptly stop itself at this final step. Let's proceed with this example.

Example: retrieving customer data using a Langchain Agent

We start by importing the necessary modules and adding our OpenAI API key

from pydantic.v1 import BaseModel, Field
from langchain.chat_models import ChatOpenAI
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.agents import AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.render import format_tool_to_openai_function
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.tools import StructuredTool

# Put your OpenAI API key here
OPENAI_API_KEY = "..."

Define the methods used in the agent

First, we define the functions that we need.

Note that we are using type annotations.
This will help Langchain to properly convert the Python functions to Langchain Tools and to represent it as OpenAI functions in OpenAI API.

def get_customer_full_name(first_name: str) -> str:
    """
    Retrieve customer's full name given the customer first name.

    Args:
        first_name (str): The first name of the customer. 

    Returns: 
        str: The full name of the customer.
    """
    full_name = first_name + "_Smith"
    return full_name

def get_customer_email(full_name: str) -> str:
    """
    Retrieve customer email given the full name of the customer. 

    Args: 
        full_name (str): The full name of the customer.

    Returns:
        str: The email of the customer.
    """
    email = full_name.lower() + "@gmail.com"
    return email

Define Pydantic arguments schema for these methods

To better convert Python functions to Langchain Tools, I found it helpful to also describe their inputs using Pydantic classes.

Those will be passed together with the function as arguments to the Langchain method that creates Tools from Python functions.

For some reason, Pydantic v2 is not yet supported, note that Pydantic v1 is used here.

class GetCustomerFullNameInput(BaseModel):
    """
    Pydantic arguments schema for get_customer_full_name method
    """
    first_name: str = Field(..., description="The first name of the customer")

class GetCustomerEmailInput(BaseModel):
    """
    Pydantic arguments schema for get_customer_email method
    """
    full_name: str = Field(..., description="The full name of the customer")

Define prompts

We will use two input prompts: a system prompt and a user input prompt.

In this case, the system prompt describes what needs to be done, and the user initialization prompt contains the question in it.

system_init_prompt = """
You are a shop manager capable of retrieving full names and emails of the customers. 
Given the question, answer it to the best of your abilities.
"""

user_init_prompt = """
The question is: {}. 
Go!
"""

Define parts of the agent using LCEL

Here, we define the parts used in the agent and create the agent and the agent executor.

First, we create the LLM object from ChatOpenAI class for OpeAI API. We pass OpenAI API key here as a parameter.

Then, we create tools from Python functions. Here, we use a method from StructuredTool to create the Tools. In our case, the functions only have one input, so the use of structured tool is not required. However, this is the right way to go when functions with multiple inputs are used. The Tools are combined in a list, and then bind() method is used to add them to the LLM object that we created above.

In the next step, we initialize the prompt object from the prompt messages that we defined above. It contains the system prompt and a formatted user init prompt. Note that it also has a placeholder for 'agent_scratchpad'. This variable is used in the agent to store the history of the agent (intermediate steps) when it is executed.

The agent is defined using LCEL, which is a recommended way to define chains and agents in Langchain. This article describes why: https://python.langchain.com/docs/expression_language/why . The agent combines input formatting, prompt, llm with tools, and a parser. In the case of OpenAI function, it is convenient to use OpenAIFunctionsAgentOutputParser right out of the box, as we do here.

Finally, we initialize the agent executor and set verbose to True to display intermediate steps. This will help us to understand how reasoning works in Langchain Agents.

# Initialize the LLM
llm = ChatOpenAI(
    temperature=0.5,
    model_name="gpt-4",
    openai_api_key=OPENAI_API_KEY,
)

# Initialize the tools
tools = [
    StructuredTool.from_function(
        func=get_customer_full_name,
        args_schema=GetCustomerFullNameInput,
        description="Function to get customer full name.",
    ), 
    StructuredTool.from_function(
        func=get_customer_email,
        args_schema=GetCustomerEmailInput,
        description="Function to get customer email",
    )
]
llm_with_tools = llm.bind(
    functions=[format_tool_to_openai_function(t) for t in tools]
)

# Initialize the prompt
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_init_prompt),
        ("user", user_init_prompt.format("{input}")),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ],
)

# Initialize agent
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

# Initialize the agent executor
agent_executor = AgentExecutor(agent=agent, 
                               tools=tools, 
                               verbose=True)

Run the chat with the agent executor

The final step is to invoke the agent with the user input message.

This step can also be done in a loop. In this case, it will work like a chat.

user_message = "What is the full name and email of our customer John?"
response = agent_executor.invoke({"input": user_message})
response = response.get("output")
print(f"Response: {response}")

Entering new AgentExecutor chain...

Invoking: get_customer_full_name with {'first_name': 'John'}

Invoking: get_customer_email with {'full_name': 'John_Smith'}

The full name of our customer John is John Smith and his email is john_smith@gmail.com.

Finished chain.

Response: The full name of our customer John is John Smith and his email is john_smith@gmail.com.

Summary and improvements

Langchain Agents is a powerful reasoning and decision-making tool that can be used in various situations, even for nor not text-related tasks. In this simple example, I explain how to set up an Agent and run it to solve a dummy task. The next step could be adding a memory to the chat to make it remember things you discussed. This could be the topic of a whole new discussion. Another possible improvement could be the set of methods (tools) to use. Those tools could also be created with LLMs in most cases, and accurately designing them could be crucial for solving your problems.