DEV Community

Cover image for LangChain's 3rd Module: Agents🦜🕴️
Jaydeep Biswas
Jaydeep Biswas

Posted on • Updated on

LangChain's 3rd Module: Agents🦜🕴️

Hey there! Throughout our latest blog series, we've delved into a wide array of subjects. Here's an overview of the topics we've explored thus far:

  1. Installation and Setup of LangChain
  2. LangChain's 1st Module: Model I/O
  3. LangChain's 2nd Module: Retrieval

Exploring LangChain's Agents 🔍🤖

Today, I want to dive into this exciting concept called "Agents" ** in LangChain. It's pretty mind-blowing!
**LangChain
introduces an innovative idea called "Agents" that takes the concept of chains to a whole new level. Agents use language models to dynamically figure out sequences of actions to perform, making them highly versatile and adaptable. Unlike regular chains, where actions are hardcoded in code, agents utilize language models as reasoning engines to decide which actions to take and in what order.

The Agent is the main part responsible for decision-making. It harnesses the power of a language model and a prompt to figure out the next steps to achieve a specific objective. The inputs to an agent usually include:

  • Tools: Descriptions of available tools (more on this later).
  • User Input: The high-level objective or query from the user.
  • Intermediate Steps: A history of (action, tool output) pairs executed to reach the current user input.

The result of an agent can either be the next thing to do (AgentActions) or the ultimate reply to give to the user (AgentFinish). An action includes details about a tool and the input needed for that tool.

Tools 🛠️

Tools are interfaces that an agent can use to interact with the world. They allow agents to perform various tasks like searching the web, running shell commands, or accessing external APIs. In LangChain, tools are crucial for expanding the capabilities of agents and helping them achieve diverse tasks.

To use tools in LangChain, you can load them using the following code:

from langchain.agents import load_tools

tool_names = [...]
tools = load_tools(tool_names)
Enter fullscreen mode Exit fullscreen mode

Some tools may need a base Language Model (LLM) for initialization. In such cases, you can pass an LLM like this:

from langchain.agents import load_tools

tool_names = [...]
llm = ...
tools = load_tools(tool_names, llm=llm)
Enter fullscreen mode Exit fullscreen mode

This setup allows you to access a variety of tools and integrate them into your agent's workflows. The complete list of tools with usage documentation is available here.

Examples of Tools 📚🔧

DuckDuckGo

The DuckDuckGo tool lets you perform web searches using its search engine. Here's an example:

from langchain.tools import DuckDuckGoSearchRun

search = DuckDuckGoSearchRun()
search.run("Manchester United vs Luton Town match summary")
Enter fullscreen mode Exit fullscreen mode

Image description

DataForSeo

The DataForSeo toolkit allows you to get search engine results using the DataForSeo API. To use it, you need to set up your API credentials:

import os

os.environ["DATAFORSEO_LOGIN"] = "<your_api_access_username>"
os.environ["DATAFORSEO_PASSWORD"] = "<your_api_access_password>"
Enter fullscreen mode Exit fullscreen mode

Once credentials are set, you can create a DataForSeoAPIWrapper tool to access the API:

from langchain.utilities.dataforseo_api_search import DataForSeoAPIWrapper

wrapper = DataForSeoAPIWrapper()

result = wrapper.run("Weather in Los Angeles")
Enter fullscreen mode Exit fullscreen mode

The DataForSeoAPIWrapper tool fetches search engine results from various sources.

You can customize the type of results and fields returned in the JSON response:

json_wrapper = DataForSeoAPIWrapper(
    json_result_types=["organic", "knowledge_graph", "answer_box"],
    json_result_fields=["type", "title", "description", "text"],
    top_count=3,
)

json_result = json_wrapper.results("Bill Gates")
Enter fullscreen mode Exit fullscreen mode

Specify the location and language for your search results:

customized_wrapper = DataForSeoAPIWrapper(
    top_count=10,
    json_result_types=["organic", "local_pack"],
    json_result_fields=["title", "description", "type"],
    params={"location_name": "Germany", "language_code": "en"},
)

customized_result = customized_wrapper.results("coffee near me")
Enter fullscreen mode Exit fullscreen mode

Choose the search engine:

customized_wrapper = DataForSeoAPIWrapper(
    top_count=10,
    json_result_types=["organic", "local_pack"],
    json_result_fields=["title", "description", "type"],
    params={"location_name": "Germany", "language_code": "en", "se_name": "bing"},
)

customized_result = customized_wrapper.results("coffee near me")
Enter fullscreen mode Exit fullscreen mode

The search is customized to use Bing as the search engine.

Specify the type of search:

maps_search = DataForSeoAPIWrapper(
    top_count=10,
    json_result_fields=["title", "value", "address", "rating", "type"],
    params={
        "location_coordinate": "52.512,13.36,12z",
        "language_code": "en",
        "se_type": "maps",
    },
)

maps_search_result = maps_search.results("coffee near me")
Enter fullscreen mode Exit fullscreen mode

These examples showcase how you can customize searches based on result types, fields, location, language, search engine, and search type.

Shell (bash)

The Shell toolkit gives agents the ability to interact with the shell environment, allowing them to run shell commands. This feature is powerful but should be used carefully, especially in sandboxed environments. Here's how to use the Shell tool:

from langchain.tools import ShellTool

shell_tool = ShellTool()

result = shell_tool.run({"commands": ["echo 'Hello World!'", "time"]})
Enter fullscreen mode Exit fullscreen mode

In this example, the Shell tool runs two shell commands: echoing "Hello World!" and displaying the current time.

Image description
You can provide the Shell tool to an agent for more complex tasks. Here's an example of an agent using the Shell tool to fetch links from a web page:

from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0.1)

shell_tool.description = shell_tool.description + f"args {shell_tool.args}".replace(
    "{", "{{"
).replace("}", "}}")
self_ask_with_search = initialize_agent(
    [shell_tool], llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
self_ask_with_search.run(
    "Download the langchain.com webpage and grep for all urls. Return only a sorted list of them. Be sure to use double quotes."
)
Enter fullscreen mode Exit fullscreen mode

Image description
In this scenario, the agent uses the Shell tool to execute a series of commands to fetch, filter, and sort URLs from a web page.

The examples provided showcase some of the tools available in LangChain. These tools ultimately expand the capabilities of agents (explored in the next subsection) and empower them to efficiently perform various tasks. Depending on your project's needs, you can choose the tools and toolkits that best suit your requirements and integrate them into your agent's workflows.

Return to Agents ↩️🤖

Let's talk about agents now.

The AgentExecutor is like the engine that runs an agent. It's responsible for calling the agent, making it do actions, giving the agent the results, and doing this in a loop until the agent finishes its task. In simpler terms, it might look something like this:

next_action = agent.get_action(...)
while next_action != AgentFinish:
    observation = run(next_action)
    next_action = agent.get_action(..., next_action, observation)
return next_action
Enter fullscreen mode Exit fullscreen mode

The AgentExecutor deals with various complexities, like what happens when the agent picks a tool that doesn't exist, handling tool errors, managing what the agent produces, and providing logs at different levels.

Although the AgentExecutor class is the main runtime for agents in LangChain, there are other experimental runtimes like:

  • Plan-and-execute Agent
  • Baby AGI
  • Auto GPT

To understand the agent framework better, let's build a basic agent from scratch and then explore pre-built agents.

Before we dive into building the agent, let's review some key terms and schema:

  • AgentAction: This is like a set of instructions for the agent. It includes the tool to use and tool_input the input for that tool. - AgentFinish: This indicates the agent has finished its task and is ready to give a response to the user. - Intermediate Steps: These are like records of what the agent did before. They help the agent remember context for future actions.

Now, let's create a simple agent using OpenAI Function Calling. We'll start by making a tool that calculates word length. This is useful because language models sometimes make mistakes when counting word lengths due to tokenization.

First, load the language model:

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
Enter fullscreen mode Exit fullscreen mode

Test the model with a word length calculation::

llm.invoke("how many letters in the word educa?")
Enter fullscreen mode Exit fullscreen mode

Define a simple function to calculate word length:

from langchain.agents import tool

@tool
def get_word_length(word: str) -> int:
    """Returns the length of a word."""
    return len(word)
Enter fullscreen mode Exit fullscreen mode

We've created a tool named get_word_length that takes a word as input and returns its length.
Now, create a prompt for the agent. The prompt guides the agent on how to reason and format the output:

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a very powerful assistant but not great at calculating word lengths.",
        ),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)
Enter fullscreen mode Exit fullscreen mode

To provide tools to the agent, format them as OpenAI function calls:

from langchain.tools.render import format_tool_to_openai_function

llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])
Enter fullscreen mode Exit fullscreen mode

Create the agent by defining input mappings and connecting components:

from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)
Enter fullscreen mode Exit fullscreen mode

We've created our agent, which understands user input, uses available tools, and formats output.
Interact with the agent:

agent.invoke({"input": "how many letters in the word educa?", "intermediate_steps": []})
Enter fullscreen mode Exit fullscreen mode

Now, let's write a runtime for the agent. The simplest runtime calls the agent, executes actions, and repeats until the agent finishes:

from langchain.schema.agent import AgentFinish

user_input = "how many letters in the word educa?"
intermediate_steps = []

while True:
    output = agent.invoke(
        {
            "input": user_input,
            "intermediate_steps": intermediate_steps,
        }
    )
    if isinstance(output, AgentFinish):
        final_result = output.return_values["output"]
        break
    else:
        print(f"TOOL NAME: {output.tool}")
        print(f"TOOL INPUT: {output.tool_input}")
        tool = {"get_word_length": get_word_length}[output.tool]
        observation = tool.run(output.tool_input)
        intermediate_steps.append((output, observation))

print(final_result)
Enter fullscreen mode Exit fullscreen mode

Image description
To simplify this, use the AgentExecutor class. It encapsulates agent execution and offers error handling, early stopping, tracing, and other improvements:

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke({"input": "how many letters in the word educa?"})
Enter fullscreen mode Exit fullscreen mode

The AgentExecutor makes it easier to interact with the agent and simplifies the execution process.

Memory in Agents 🧠🤖

The agent we've made so far doesn't remember past conversations, making it stateless. To enable follow-up questions and continuous conversations, we need to add memory to the agent. Here are the two steps involved:

  1. Add a memory variable in the prompt to store chat history.
  2. Keep track of the chat history during interactions.

Let's start by adding a memory placeholder in the prompt:

from langchain.prompts import MessagesPlaceholder

MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a very powerful assistant but not great at calculating word lengths.",
        ),
        MessagesPlaceholder(variable_name=MEMORY_KEY),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)
Enter fullscreen mode Exit fullscreen mode

Now, create a list to track the chat history:

from langchain.schema.messages import HumanMessage, AIMessage

chat_history = []
Enter fullscreen mode Exit fullscreen mode

In the agent creation step, include the memory as well:

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
        "chat_history": lambda x: x["chat_history"],
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)
Enter fullscreen mode Exit fullscreen mode

When running the agent, make sure to update the chat history:

input1 = "how many letters in the word educa?"
result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend([
    HumanMessage(content=input1),
    AIMessage(content=result["output"]),
])
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})
Enter fullscreen mode Exit fullscreen mode

This lets the agent maintain a conversation history and answer follow-up questions based on past interactions.

Congratulations! You've successfully created and executed your first end-to-end agent in LangChain. To explore LangChain's capabilities further, you can delve into:

  • Different agent types supported.
  • Pre-built Agents
  • How to work with tools and tool integrations.

Agent Types 🤖📝

LangChain offers various agent types, each suited for specific use cases. Here are some available agents:

  • Zero-shot ReAct: Chooses tools based on their descriptions using the ReAct framework. Versatile and requires tool descriptions.
  • Structured input ReAct: Handles multi-input tools, suitable for tasks like web browsing. Uses a tools' argument schema for structured input.
  • OpenAI Functions: Designed for models fine-tuned for function calling, compatible with models like gpt-3.5-turbo-0613 and gpt-4-0613.
  • Conversational: Tailored for conversational settings, uses ReAct for tool selection, and employs memory to remember previous interactions.
  • Self-ask with search: Relying on a single tool, "Intermediate Answer," it looks up factual answers to questions.
  • ReAct document store: Interacts with a document store using the ReAct framework, requiring "Search" and "Lookup" tools.

Explore these agent types to find the one that best suits your needs in LangChain. These agents allow you to bind a set of tools within them to handle actions and generate responses.

Prebuilt Agents 🤖🛠️

Let's continue our exploration of agents, focusing on prebuilt agents available in LangChain.

LangChain Gmail Toolkit 📧🔧

LangChain provides a convenient toolkit for Gmail, allowing you to connect your LangChain email to the Gmail API. To get started, follow these steps:

  1. Set Up Credentials:

    • Download the credentials.json file as explained in the Gmail API documentation.
    • Install required libraries using the following commands:
     pip install --upgrade google-api-python-client
     pip install --upgrade google-auth-oauthlib
     pip install --upgrade google-auth-httplib2
     pip install beautifulsoup4  # Optional for parsing HTML messages
    
  2. Create Gmail Toolkit:

    • Initialize the toolkit with default settings:
     from langchain.agents.agent_toolkits import GmailToolkit
    
     toolkit = GmailToolkit()
    
  • Customize authentication as needed. Behind the scenes, a googleapi resource is created using the following methods:

     from langchain.tools.gmail.utils import build_resource_service, get_gmail_credentials
    
     credentials = get_gmail_credentials(
         token_file="token.json",
         scopes=["https://mail.google.com/"],
         client_secrets_file="credentials.json",
     )
     api_resource = build_resource_service(credentials=credentials)
     toolkit = GmailToolkit(api_resource=api_resource)
    
  1. Use Toolkit Tools:
    • The toolkit offers various tools such as GmailCreateDraft, GmailSendMessage, GmailSearch, GmailGetMessage, and GmailGetThread.
_GmailCreateDraft_: Create a draft email with specified message fields.
_GmailSendMessage_: Send email messages.
_GmailSearch_: Search for email messages or threads.
_GmailGetMessage_: Fetch an email by message ID.
_GmailGetThread_: Search for email messages.
Enter fullscreen mode Exit fullscreen mode
  1. Initialize Agent:

    • Initialize the agent with the toolkit and other settings:
     from langchain.llms import OpenAI
     from langchain.agents import initialize_agent, AgentType
    
     llm = OpenAI(temperature=0)
     agent = initialize_agent(
         tools=toolkit.get_tools(),
         llm=llm,
         agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
     )
    
  2. Examples:

    • Create a Gmail draft for editing:
     agent.run("Create a Gmail draft for me to edit...")
    
  • Search for the latest email in your drafts:

     agent.run("Could you search in my drafts for the latest email?")
    

These examples demonstrate LangChain's Gmail toolkit capabilities, enabling programmatic interactions with Gmail.

SQL Database Agent 📊🤖

This agent interacts with SQL databases, particularly the Chinook database. Be cautious as it is still in development. To use:

  1. Initialize Agent:
   from langchain.agents import create_sql_agent
   from langchain.agents.agent_toolkits import SQLDatabaseToolkit
   from langchain.sql_database import SQLDatabase
   from langchain.llms.openai import OpenAI
   from langchain.agents import AgentExecutor
   from langchain.agents.agent_types import AgentType
   from langchain.chat_models import ChatOpenAI

   db = SQLDatabase.from_uri("sqlite:///../../../../../notebooks/Chinook.db")
   toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))

   agent_executor = create_sql_agent(
       llm=OpenAI(temperature=0),
       toolkit=toolkit,
       verbose=True,
       agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
   )
Enter fullscreen mode Exit fullscreen mode
_Disclaimer_

- The query chain may generate insert/update/delete queries. Be cautious, and use a custom prompt or create a SQL user without write permissions if needed.
- Be aware that running certain queries, such as "run the biggest query possible," could overload your SQL database, especially if it contains millions of rows.
- Data warehouse-oriented databases often support user-level quotas to limit resource usage.
Enter fullscreen mode Exit fullscreen mode
  1. Examples:

    • Describe a table:
     agent_executor.run("Describe the playlisttrack table")
    
  • Run a query:

     agent_executor.run("List the total sales per country. Which country's customers spent the most?")
    

The agent will execute the query and provide the result, such as the country with the highest total sales.

To get the total number of tracks in each playlist, you can use the following query:

```
agent_executor.run("Show the total number of tracks in each playlist. The Playlist name should be included in the result.")
```
Enter fullscreen mode Exit fullscreen mode

The agent will return the playlist names along with the corresponding total track counts.

  1. Caution:
    • Be cautious about running certain queries that could overload your database.

Pandas DataFrame Agent 🐼📊🤖

This agent interacts with Pandas DataFrames for question-answering purposes. Use with caution to prevent potential harm from generated Python code:

  1. Initialize Agent:
   from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
   from langchain.chat_models import ChatOpenAI
   from langchain.agents.agent_types import AgentType

   from langchain.llms import OpenAI
   import pandas as pd

   df = pd.read_csv("titanic.csv")

   agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)
Enter fullscreen mode Exit fullscreen mode
  1. Examples:

    • Count rows in the DataFrame:
     agent.run("how many rows are there?")
    
  • Filter rows based on criteria:

     agent.run("how many people have more than 3 siblings")
    

Jira Toolkit 📅🔧

The Jira toolkit allows agents to interact with a Jira instance. Follow these steps:

  1. Install Libraries and Set Environment Variables:
   %pip install atlassian-python-api
Enter fullscreen mode Exit fullscreen mode
   import os
   from langchain.agents import AgentType
   from langchain.agents import initialize_agent
   from langchain.agents.agent_toolkits.jira.toolkit import JiraToolkit
   from langchain.llms import OpenAI
   from langchain.utilities.jira import JiraAPIWrapper

   os.environ["JIRA_API_TOKEN"] = "abc"
   os.environ["JIRA_USERNAME"] = "123"
   os.environ["JIRA_INSTANCE_URL"] = "https://jira.atlassian.com"
   os.environ["OPENAI_API_KEY"] = "xyz"

   llm = OpenAI(temperature=0)
   jira = JiraAPIWrapper()
   toolkit = JiraToolkit.from_jira_api_wrapper(jira)
   agent = initialize_agent(
       toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
   )
Enter fullscreen mode Exit fullscreen mode
  1. Examples:

    • Create a new issue in a project:
     agent.run("make a new issue in project PW to remind me to make more fried rice")
    

Now, you can interact with your Jira instance using natural language instructions and the Jira toolkit.

Top comments (0)