Bato

Posted on Feb 21

A Beginner's Guide to Multi-Agent Systems: How AI Agents Work Together

#ai #agents #genai #tutorial

You've probably heard the term "AI agents" thrown around a lot lately. But recently, a new idea has been taking over engineering discussions: multi-agent systems. Not one AI doing everything: but a team of AIs, each with a specific job, collaborating to tackle complex problems.

Here's a surprise: if you've ever used Claude Code to refactor a large codebase or fix a tricky bug, you've already seen a multi-agent system at work, you just might not have known it.

If that sounds complicated, don't worry. By the end of this guide, you'll understand what multi-agent systems are, why they matter, and how to build a simple one yourself (no PhD required).

First: What Even Is an "Agent"?

Before we go multi, let's make sure we're clear on what a single agent is.

A traditional LLM (like GPT or Claude) takes input and produces output — one shot, done. An agent goes further: it can reason, use tools, and take actions in a loop until a goal is completed.

Think of it this way:

LLM: "Here's a summary of that article."
Agent: "I'll search the web for that article, read it, cross-check it with two other sources, and then give you a summary with citations."

Agents typically follow a loop:

Observe → Think → Act → Observe again → ...

A common implementation looks roughly like this:

def run_agent(goal: str, tools: list) -> str:
    messages = [{"role": "user", "content": goal}]

    while True:
        response = llm.chat(messages, tools=tools)

        if response.is_final_answer:
            return response.content

        # The LLM decided to use a tool
        tool_result = execute_tool(response.tool_call)
        messages.append({"role": "tool", "content": tool_result})

Simple enough. So why do we need multiple agents?

The Problem With One Agent Doing Everything

Imagine you ask a single agent to:

"Research our top 3 competitors, write a market analysis report, and then draft 5 LinkedIn posts based on it."

That's three very different jobs: researcher, analyst, copywriter. Cramming all of that into one agent creates real problems:

Context window overload — Long tasks fill up the LLM's memory fast, causing it to "forget" earlier steps.
Lack of specialization — An agent trying to do everything tends to do nothing particularly well.
Hard to debug — When something goes wrong, you don't know which "part" failed.
No parallelism — One agent does things one at a time. What if subtasks could run simultaneously?

This is exactly the problem multi-agent systems solve.

You're Already Using Multi-Agent AI

Before we get to theory, let's look at a tool many developers already have in their terminal: Claude Code.

When you ask Claude Code something simple like "fix the bug on line 42", it handles it in a single pass. But ask it something more complex:"refactor this entire module, write tests, and check for regressions", and something more interesting happens under the hood.

Claude Code acts as an orchestrator. Instead of trying to hold the entire task in one context window, it breaks the work down and can spin up subagents: separate Claude instances with specific, scoped roles. One subagent might be tasked with exploring the codebase structure, another with writing the actual refactored code, and another with running the test suite and reporting results. Each subagent operates independently, does its job, and reports back.

You
 └─▶ Claude Code (Orchestrator)
        ├─▶ Subagent A: "Explore the repo and map dependencies"
        ├─▶ Subagent B: "Rewrite the module based on the map"
        └─▶ Subagent C: "Run tests and report failures"

The orchestrator then assembles the results and gives you a single coherent answer — as if one very capable developer had done it all.

This is the multi-agent pattern in action. And the same design is behind tools like Devin, OpenAI's Operator, and many of the AI-powered developer tools launching in 2025–2026. Now let's understand how it works so you can build your own.

What Is a Multi-Agent System?

A multi-agent system (MAS) is a setup where multiple AI agents work together — each with a defined role — to complete a larger task. Think of it like a software engineering team: you have a project manager, a frontend dev, a backend dev, and a QA engineer. Each is an expert in their lane, and a coordinator ties their work together.

The key building blocks are:

1. The Orchestrator (a.k.a. the "Manager Agent")

This is the brain that receives the high-level goal, breaks it into subtasks, assigns those subtasks to specialized agents, and assembles the final result. The orchestrator doesn't necessarily do the actual work — it delegates.

2. Subagents (a.k.a. "Worker Agents")

These agents handle specific, well-scoped tasks. A ResearchAgent searches the web. A WriterAgent drafts content. A CodeAgent writes and runs code. Each has its own set of tools appropriate to its role.

3. Tools

Tools are functions that agents can call — web search, code execution, API calls, database queries, file I/O. Tools are what make agents actually useful in the real world.

4. Memory

Agents need context. Memory can be:

Short-term (conversation history within a session)
Long-term (a vector database or knowledge store that persists between runs)

5. Communication

Agents pass messages to each other — typically as structured text or JSON. The orchestrator sends a task; the subagent returns a result.

Building a Simple Multi-Agent System

Let's put this into code. We'll build a small, framework-agnostic example: a two-agent system where one agent researches a topic and another writes a blog intro based on the research.

We'll use Python and the OpenAI API (you can swap this for any LLM provider, the pattern stays the same).

Setup

pip install openai

import openai
import json

client = openai.OpenAI(api_key="your-api-key-here")

def call_llm(system_prompt: str, user_message: str) -> str:
    """A simple wrapper to call an LLM with a system + user prompt."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ]
    )
    return response.choices[0].message.content

The Research Agent

def research_agent(topic: str) -> str:
    """
    A specialized agent whose only job is to gather key facts about a topic.
    In a real system, this agent would have web search tools.
    For simplicity, we're having the LLM draw on its training knowledge.
    """
    system_prompt = """
    You are a research assistant. Your job is to provide a concise,
    factual summary of a given topic — 5 key bullet points, nothing more.
    Focus on accuracy and relevance. Do not editorialize.
    """
    result = call_llm(system_prompt, f"Research this topic: {topic}")
    print(f"[ResearchAgent] Done. Key facts gathered.\n")
    return result

The Writer Agent

def writer_agent(topic: str, research: str) -> str:
    """
    A specialized agent whose only job is to write engaging content
    based on provided research. It doesn't search — it just writes.
    """
    system_prompt = """
    You are a skilled technical writer for a developer blog.
    Given a topic and research notes, write a compelling, friendly
    introduction paragraph (3-4 sentences) that hooks the reader.
    Write for developers, not academics.
    """
    user_message = f"""
    Topic: {topic}

    Research notes:
    {research}

    Write the intro paragraph now.
    """
    result = call_llm(system_prompt, user_message)
    print(f"[WriterAgent] Done. Intro written.\n")
    return result

The Orchestrator

def orchestrator(goal: str) -> str:
    """
    The orchestrator receives a high-level goal, breaks it into subtasks,
    delegates to specialized agents, and assembles the final output.
    """
    print(f"[Orchestrator] Goal received: '{goal}'")
    print(f"[Orchestrator] Delegating research task...\n")

    # Step 1: Extract the topic from the goal (in a real system,
    # the orchestrator would use an LLM to parse the goal)
    topic = goal  # Simplified for this example

    # Step 2: Delegate to ResearchAgent
    research_output = research_agent(topic)

    # Step 3: Delegate to WriterAgent, passing the research output
    print(f"[Orchestrator] Delegating writing task...\n")
    final_output = writer_agent(topic, research_output)

    # Step 4: Return assembled result
    print(f"[Orchestrator] All tasks complete. Returning final output.\n")
    return final_output


# Run it
result = orchestrator("The rise of multi-agent AI systems in 2025")
print("=== FINAL OUTPUT ===")
print(result)

Sample Output

[Orchestrator] Goal received: 'The rise of multi-agent AI systems in 2025'
[Orchestrator] Delegating research task...

[ResearchAgent] Done. Key facts gathered.

[Orchestrator] Delegating writing task...

[WriterAgent] Done. Intro written.

[Orchestrator] All tasks complete. Returning final output.

=== FINAL OUTPUT ===
In 2025, AI stopped being a solo act. Multi-agent systems — where
teams of specialized AI models collaborate on complex tasks — emerged
from research labs into production engineering stacks at companies like
Google, OpenAI, and Anthropic. Rather than asking one model to do
everything, developers are now designing pipelines where a "manager"
agent delegates research, writing, coding, and verification to expert
subagents. If you've been wondering what all the buzz is about, you're
in exactly the right place.

This is the core pattern. In a production system, you'd add real web search tools, error handling, retry logic, agent memory, and parallel execution — but the orchestrator → delegate → assemble structure stays the same.

Real-World Use Cases

Multi-agent systems shine whenever a task is too large, complex, or varied for a single agent. Here are three common patterns you'll see in the wild:

1. Automated research pipelines
One agent searches and gathers sources, another reads and extracts key points, a third synthesizes findings into a report. No single agent's context window gets overwhelmed.

2. AI coding assistants (like Claude Code)
This is the most accessible real-world example. Claude Code uses an orchestrator-subagent model: when given a complex task, the main agent breaks it into subtasks and delegates — one subagent explores the codebase, one writes or modifies code, one runs shell commands and tests. Each subagent has a narrow, well-defined job. This same pattern powers tools like Devin and SWE-agent.

3. Customer support automation
An IntentAgent classifies the user's issue, a KnowledgeAgent retrieves the relevant documentation, and a ResponseAgent drafts the reply. Each agent is small, fast, and easy to tune independently.

Common Pitfalls to Avoid

Giving agents too much responsibility. The whole point of multi-agent systems is specialization. If your ResearchAgent is also writing and formatting the output, it's not really specialized.

Forgetting error handling between agents. What happens if the research agent returns nothing? Your writer agent will hallucinate. Always validate the output of one agent before passing it to the next.

Ignoring cost and latency. Each agent call costs money and time. More agents ≠ better results. Start with the minimum number of agents needed and add more only when you hit a real bottleneck.

No logging or tracing. In a chain of agents, debugging is hard without visibility. Add logs at every handoff (like the print statements in our example), and consider tools like LangSmith or Langfuse for production tracing.

Where to Go From Here

You now understand the fundamentals. Here are some good next steps depending on where you want to go:

Try LangGraph if you want a production-grade framework for building stateful, graph-based agent workflows with built-in support for cycles and conditional edges.
Try Google's Agent Development Kit (ADK) if you want Google's official framework — it was just announced and has great tooling for building hierarchical agent systems.
Try OpenAI's Agents SDK if you're already in the OpenAI ecosystem and want handoffs and tool-calling built in out of the box.
Read "Patterns for Building LLM-based Systems" by Eugene Yan — one of the best practical overviews of agent design patterns available.

Wrapping Up

Multi-agent systems aren't magic, and they're not just hype either. They're a practical engineering pattern for solving problems that are genuinely hard for a single AI to handle — tasks that are too long, too complex, or too diverse.

The pattern is simple: break down the goal → assign specialized agents → orchestrate the results. Start small, keep your agents focused, and add complexity only when you need it.

The era of AI teamwork is just getting started, and now you know how to build your own team.

Did this help? Drop a comment with what you're building — I'd love to hear what multi-agent use cases you're exploring.

DEV Community