klement Gunndu

Posted on Mar 16

Build Your First Multi-Agent System in Python — 3 Patterns That Scale

#beginners #ai #tutorial #python

Your single AI agent handles 5 tools. Then 10. Then 20. Somewhere around tool number 15, it starts picking the wrong one half the time.

This is the single-agent ceiling. Every production team hits it. The fix is not a better prompt — it is splitting work across multiple specialized agents that coordinate.

Multi-agent systems sound complex. They are not. If you have built one agent, you already know 80% of what you need. The remaining 20% is coordination — and LangGraph gives you 3 patterns to handle it.

This tutorial covers each pattern with working code. By the end, you will have a multi-agent system where a supervisor routes questions to a research agent and a math agent, each doing what it does best.

Why One Agent Is Not Enough

A single agent with 20 tools faces two problems:

Tool selection degrades. LLMs pick the right tool reliably from 5 options. At 20 options, accuracy drops — the model spends tokens reasoning about which tool to use instead of using it. Fewer tools per agent means higher accuracy per call.

Context gets crowded. Each tool needs a description. Twenty descriptions consume 2,000-4,000 tokens before the user's question even arrives. That is context window budget spent on tool definitions instead of reasoning.

Multi-agent systems fix both problems. Each agent gets 2-5 tools. Tool selection stays accurate. Context stays focused. And you can swap out one specialist without rewriting the whole system.

Prerequisites

You need Python 3.10+ and these packages:

pip install langgraph langchain-openai langgraph-supervisor

Set your OpenAI API key:

export OPENAI_API_KEY="your-key-here"

Note: These examples use OpenAI, but LangGraph works with any LangChain-compatible model (Anthropic, Google, local models via Ollama). Swap ChatOpenAI for ChatAnthropic or ChatOllama and the patterns stay the same.

Pattern 1: Subagents as Tools

This is the pattern LangGraph recommends for most use cases as of March 2026. You wrap each specialist agent as a tool that your main agent calls like any other function.

The idea: each specialist is a full agent with its own tools and prompt. The main agent sees each specialist as a single tool — "ask the math expert" or "ask the research expert."

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

model = ChatOpenAI(model="gpt-4o-mini")

# --- Define specialist tools ---

@tool
def add(a: float, b: float) -> float:
    """Add two numbers."""
    return a + b

@tool
def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

@tool
def web_search(query: str) -> str:
    """Search the web for information."""
    # In production, connect to a real search API
    return f"Search results for: {query}"

# --- Create specialist agents ---

math_agent = create_react_agent(
    model=model,
    tools=[add, multiply],
    name="math_expert",
    prompt="You are a math expert. Use your tools to solve math problems. Always show your work.",
)

research_agent = create_react_agent(
    model=model,
    tools=[web_search],
    name="research_expert",
    prompt="You are a research expert. Use web search to find accurate information.",
)

# --- Wrap specialists as tools for the main agent ---

@tool
def ask_math_expert(question: str) -> str:
    """Ask the math expert. Use for ALL math questions."""
    response = math_agent.invoke(
        {"messages": [{"role": "user", "content": question}]}
    )
    return response["messages"][-1].content

@tool
def ask_research_expert(question: str) -> str:
    """Ask the research expert. Use for ALL factual questions."""
    response = research_agent.invoke(
        {"messages": [{"role": "user", "content": question}]}
    )
    return response["messages"][-1].content

# --- Create the main coordinating agent ---

coordinator = create_react_agent(
    model=model,
    tools=[ask_math_expert, ask_research_expert],
    prompt=(
        "You coordinate two experts. "
        "For math questions, use ask_math_expert. "
        "For factual questions, use ask_research_expert. "
        "Combine their answers when needed."
    ),
)

# --- Run it ---

result = coordinator.invoke({
    "messages": [{"role": "user", "content": "What is 42 * 17?"}]
})

print(result["messages"][-1].content)

Why this pattern works: The coordinator agent sees 2 tools, not 4. Each specialist sees 1-2 tools. Tool selection stays accurate at every level. And you can add a third specialist (say, a code expert) by writing one more wrapper function — no changes to existing agents.

Trade-off: Each specialist invocation is a separate LLM call. A question that hits two specialists costs 3 LLM calls total (coordinator + 2 specialists). For simple queries, this is overkill.

Pattern 2: Supervisor With Automatic Routing

The langgraph-supervisor package gives you a supervisor agent that automatically creates handoff tools for each specialist. You define the agents; the supervisor figures out who handles what.

from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

model = ChatOpenAI(model="gpt-4o-mini")

# --- Specialist tools ---

@tool
def add(a: float, b: float) -> float:
    """Add two numbers."""
    return a + b

@tool
def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

@tool
def web_search(query: str) -> str:
    """Search the web for information."""
    return f"Search results for: {query}"

# --- Create specialist agents ---

math_agent = create_react_agent(
    model=model,
    tools=[add, multiply],
    name="math_expert",
    prompt="You are a math expert. Always use one tool at a time.",
)

research_agent = create_react_agent(
    model=model,
    tools=[web_search],
    name="research_expert",
    prompt="You are a researcher with web search access. Do not do any math.",
)

# --- Create supervisor ---

workflow = create_supervisor(
    agents=[research_agent, math_agent],
    model=model,
    prompt=(
        "You are a team supervisor managing a research expert and a math expert. "
        "For current events, use research_expert. "
        "For math problems, use math_expert."
    ),
)

app = workflow.compile()

# --- Run it ---

result = app.invoke({
    "messages": [{"role": "user", "content": "What is 42 * 17?"}]
})

for message in result["messages"]:
    if hasattr(message, "content") and message.content:
        print(f"{message.type}: {message.content}")

Key difference from Pattern 1: The supervisor package automatically generates handoff tools. You do not write wrapper functions. The supervisor creates transfer_to_math_expert and transfer_to_research_expert tools behind the scenes.

When to use this: When you want quick setup with minimal boilerplate. The supervisor pattern shines when agents have clear, non-overlapping responsibilities.

Version note (March 2026): langgraph-supervisor v0.0.31 works, but the LangGraph team now recommends the subagents-as-tools pattern (Pattern 1) for most new projects. The supervisor library is being maintained for backward compatibility, but Pattern 1 gives you more control over context engineering.

Pattern 3: Handoffs (State-Driven Agent Switching)

Handoffs let one agent transfer control to another based on what it discovers during execution. A triage agent identifies the problem, then hands off to the right specialist — billing, support, or escalation.

In LangGraph, handoffs work through state updates. A tool returns a Command object that updates the state and routes to the next node. The conversation history travels with the transfer.

Here is a simplified handoff pattern using langgraph_supervisor's built-in create_handoff_tool:

from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor, create_handoff_tool
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

model = ChatOpenAI(model="gpt-4o-mini")

# --- Domain tools ---

@tool
def lookup_invoice(customer_id: str) -> str:
    """Look up a customer's invoice."""
    return f"Invoice for {customer_id}: $99.00 due 2026-04-01"

@tool
def check_system_status(service: str) -> str:
    """Check the status of a service."""
    return f"{service} status: operational, 99.9% uptime last 30 days"

# --- Create specialist agents with handoff tools ---

billing_agent = create_react_agent(
    model=model,
    tools=[
        lookup_invoice,
        create_handoff_tool(
            agent_name="support_agent",
            description="Transfer to technical support for non-billing issues",
        ),
    ],
    name="billing_agent",
    prompt=(
        "You handle billing questions only. "
        "If the question is about technical issues, transfer to support_agent."
    ),
)

support_agent = create_react_agent(
    model=model,
    tools=[
        check_system_status,
        create_handoff_tool(
            agent_name="billing_agent",
            description="Transfer to billing for payment-related issues",
        ),
    ],
    name="support_agent",
    prompt=(
        "You handle technical support only. "
        "If the question is about billing, transfer to billing_agent."
    ),
)

# --- Wire them with a supervisor for routing ---

workflow = create_supervisor(
    agents=[billing_agent, support_agent],
    model=model,
    prompt=(
        "You are a customer service supervisor. "
        "Route billing questions to billing_agent. "
        "Route technical questions to support_agent."
    ),
)

app = workflow.compile()

# --- Run with recursion limit to prevent infinite handoff loops ---

result = app.invoke(
    {"messages": [{"role": "user", "content": "My payment failed and now I can't log in"}]},
    config={"recursion_limit": 15},
)

for message in result["messages"]:
    if hasattr(message, "content") and message.content:
        print(f"{message.type}: {message.content}")

How it works: create_handoff_tool generates a tool that triggers agent-to-agent transfer. When billing_agent calls transfer_to_support_agent, the supervisor routes execution to support_agent with the full conversation history. The receiving agent picks up where the previous one left off.

When to use this: When tasks cross domain boundaries during execution. Customer support (billing → technical), multi-step workflows (draft → review → publish), or any pipeline where one agent's output becomes another agent's input.

Trade-off: Without a recursion_limit, a handoff loop between two agents runs until your API bill makes you cry. Always set it:

config = {"recursion_limit": 15}
result = app.invoke({"messages": [...]}, config=config)

Which Pattern Should You Pick?

Here is the decision matrix:

Situation	Pattern	Why
2-5 specialists, clear routing	Subagents as tools	Most control, recommended default
Quick prototype, non-overlapping agents	Supervisor	Least boilerplate
Sequential workflow (A → B → C)	Handoffs	Natural flow, no central bottleneck
Complex routing + parallel execution	Custom LangGraph workflow	Full control (advanced)

Start with Pattern 1 (subagents as tools). Move to Pattern 3 (handoffs) when your agents need to transfer context-heavy conversations. Use Pattern 2 (supervisor) for rapid prototyping when you want the fastest path to a working multi-agent demo.

3 Mistakes Beginners Make With Multi-Agent Systems

1. Too many agents, too soon. Start with 2 agents. Get the coordination right. Add a third only when you have a real use case — not "just in case."

2. Overlapping tool responsibilities. If both your research agent and your general agent can search the web, the coordinator will pick the wrong one. Each agent needs exclusive ownership of its tools.

3. No recursion limit. Without recursion_limit, a handoff loop between two agents runs until your token budget is gone. Always set it — 10-15 is a reasonable starting point for most multi-agent workflows.

What to Build Next

You now have 3 working patterns. Here are concrete next steps:

Add persistence. Import MemorySaver from langgraph.checkpoint.memory and pass checkpointer=MemorySaver() to your agent. Now it remembers past conversations.
Add streaming. Replace app.invoke() with app.stream() to get token-by-token output. Useful for chat interfaces.
Add a real tool. Replace the mock web_search with a real API call — Tavily, SerpAPI, or your own database query. That is where multi-agent systems go from demo to production.

The single-agent ceiling is real. Multi-agent systems break through it — not by adding complexity, but by dividing it.

Follow @klement_gunndu for more AI engineering content. We're building in public.

Top comments (2)

Michael T • Mar 16

Hi everyone, I’ve decided to sell the entire IP, codebase, and data for my latest project.

I’ve built a robust, production-ready API engine designed for grocery retail and e-commerce. Due to a shift in my personal focus, I’m looking for an immediate exit.

What you are getting:
The Engine: High-performance
API built for speed.
The Intelligence: Advanced allergen detection and "Healthy Swaps" logic. The Tech Stack: React 18 + Vite, Pure inline CSS, html5- qrcode, Node.js. Assets: Full source code, documentation.

Why this is valuable:
Retailers like Tesco, Walmart, or BinDawood are desperate for tools that drive Private Label conversion. My App does exactly that by suggesting safe store-brand alternatives the moment an allergen is detected.

The Deal:
I am looking for a clean, fast acquisition. If you are a developer looking for a head-start, a SaaS founder, or a retail-tech scout, this is a plug-and-play asset.

Interested? DM me here for technical documentation and price discussion. I’m available all day

klement Gunndu • Mar 28

Interesting pivot! Though this feels more like a classifieds post than a multi-agent discussion — curious what part of the grocery API stack involved agent coordination, if any?