DEV Community

Cover image for Multi-Agent Systems with LLMs: A Developer's Guide (2026)
Serhii Kalyna
Serhii Kalyna

Posted on • Originally published at kalyna.pro

Multi-Agent Systems with LLMs: A Developer's Guide (2026)

A multi-agent system is a group of AI agents that each handle a specific task and pass results to one another. Instead of prompting one model to do everything, you split the work — a researcher finds facts, a writer drafts content, an editor reviews it.

Originally published at kalyna.pro

Why Multi-Agent?

Single-agent LLM calls hit limits quickly:

  • Context window overflow — one agent can't hold a 200-page report and write a summary simultaneously
  • Quality degradation — researching AND writing AND fact-checking in one prompt produces mediocre results
  • No parallelism — agents can run simultaneously; sequential prompts cannot
  • Hard to debug — when one big prompt fails, you don't know which step broke

Core Architecture Patterns

1. Orchestrator → Workers

Orchestrator
├── Research Agent    → gathers data
├── Analysis Agent    → interprets data
└── Writer Agent      → produces output
Enter fullscreen mode Exit fullscreen mode

2. Pipeline (Sequential)

Input → Researcher → Drafter → Editor → Output
Enter fullscreen mode Exit fullscreen mode

3. Parallel Fan-Out

         ┌─ Agent A ─┐
Input ───┤─ Agent B ─├─── Merger ── Output
         └─ Agent C ─┘
Enter fullscreen mode Exit fullscreen mode

Simple Two-Agent Pipeline

import anthropic

client = anthropic.Anthropic()


def researcher(topic: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a technical researcher. Return 5-7 concise bullet points.",
        messages=[{"role": "user", "content": f"Research topic: {topic}"}],
    )
    return response.content[0].text


def writer(topic: str, research: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="You are a technical writer for developers. Write clear, practical prose.",
        messages=[{
            "role": "user",
            "content": f"Topic: {topic}\n\nResearch:\n{research}\n\nWrite a 3-paragraph explanation.",
        }],
    )
    return response.content[0].text


topic = "how transformer attention mechanisms work"
facts = researcher(topic)
article = writer(topic, facts)
print(article)
Enter fullscreen mode Exit fullscreen mode

Orchestrator Pattern

import anthropic
from dataclasses import dataclass

@dataclass
class AgentResult:
    agent: str
    output: str

client = anthropic.Anthropic()


def run_agent(name: str, system: str, user: str, max_tokens: int = 1024) -> AgentResult:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=max_tokens,
        system=system,
        messages=[{"role": "user", "content": user}],
    )
    return AgentResult(agent=name, output=response.content[0].text)


def orchestrator(task: str) -> str:
    research = run_agent("Researcher", "Return bullet-point facts only.", f"Research: {task}")
    draft = run_agent(
        "Writer", "Write clearly for developers.",
        f"Task: {task}\n\nResearch:\n{research.output}\n\nWrite a draft.",
        max_tokens=2048,
    )
    final = run_agent(
        "Editor", "Fix clarity, remove redundancy. Return only the improved text.",
        f"Edit:\n\n{draft.output}",
        max_tokens=2048,
    )
    return final.output
Enter fullscreen mode Exit fullscreen mode

Parallel Fan-Out

from concurrent.futures import ThreadPoolExecutor, as_completed

def research_source(source_name: str, topic: str) -> tuple[str, str]:
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": f"You are a {source_name} expert. Give 3 bullet points about: {topic}"}],
    )
    return source_name, response.content[0].text


def parallel_research(topic: str, sources: list[str]) -> dict[str, str]:
    results = {}
    with ThreadPoolExecutor(max_workers=len(sources)) as pool:
        futures = {pool.submit(research_source, s, topic): s for s in sources}
        for future in as_completed(futures):
            name, output = future.result()
            results[name] = output
    return results


research = parallel_research("LLMs in production", ["DevOps", "ML Engineering", "Security"])
Enter fullscreen mode Exit fullscreen mode

Agent Handoffs with State

from dataclasses import dataclass, field

@dataclass
class PipelineState:
    topic: str
    research: str = ""
    outline: str = ""
    draft: str = ""
    errors: list[str] = field(default_factory=list)


def research_agent(state: PipelineState) -> PipelineState:
    try:
        resp = client.messages.create(
            model="claude-haiku-4-5", max_tokens=512,
            messages=[{"role": "user", "content": f"Research: {state.topic}"}],
        )
        state.research = resp.content[0].text
    except Exception as e:
        state.errors.append(f"research: {e}")
    return state
Enter fullscreen mode Exit fullscreen mode

When to Use Multi-Agent vs Single Agent

Scenario Approach
Task fits in one prompt Single agent
Clear sequential steps Pipeline
Complex, dynamic steps Orchestrator
Independent sub-tasks Parallel fan-out

Cost Tips

  • Use claude-haiku-4-5 for worker agents, claude-sonnet-4-6 for synthesis
  • Parallelize independent agents to cut wall-clock time
  • Enable prompt caching for shared system prompts
  • Cap per-agent max_tokens — researchers don't need 4096

Summary

Multi-agent systems let you build AI pipelines that are smarter, faster, and easier to debug:

  • Split work by skill: researcher, writer, editor
  • Run independent agents in parallel with ThreadPoolExecutor
  • Pass structured state between agents
  • Use cheap models for workers, powerful models for synthesis
  • Fail fast and log errors per agent

Related: How to Build an AI Agent with Python and LangChain for Beginners.

Top comments (0)