Serhii Kalyna

Posted on May 15 • Originally published at kalyna.pro

Multi-Agent Systems with LLMs: A Developer's Guide (2026)

#ai #llm #python #tutorial

A multi-agent system is a group of AI agents that each handle a specific task and pass results to one another. Instead of prompting one model to do everything, you split the work — a researcher finds facts, a writer drafts content, an editor reviews it.

Originally published at kalyna.pro

Why Multi-Agent?

Single-agent LLM calls hit limits quickly:

Context window overflow — one agent can't hold a 200-page report and write a summary simultaneously
Quality degradation — researching AND writing AND fact-checking in one prompt produces mediocre results
No parallelism — agents can run simultaneously; sequential prompts cannot
Hard to debug — when one big prompt fails, you don't know which step broke

Core Architecture Patterns

1. Orchestrator → Workers

Orchestrator
├── Research Agent    → gathers data
├── Analysis Agent    → interprets data
└── Writer Agent      → produces output

2. Pipeline (Sequential)

Input → Researcher → Drafter → Editor → Output

3. Parallel Fan-Out

         ┌─ Agent A ─┐
Input ───┤─ Agent B ─├─── Merger ── Output
         └─ Agent C ─┘

Simple Two-Agent Pipeline

import anthropic

client = anthropic.Anthropic()


def researcher(topic: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="You are a technical researcher. Return 5-7 concise bullet points.",
        messages=[{"role": "user", "content": f"Research topic: {topic}"}],
    )
    return response.content[0].text


def writer(topic: str, research: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system="You are a technical writer for developers. Write clear, practical prose.",
        messages=[{
            "role": "user",
            "content": f"Topic: {topic}\n\nResearch:\n{research}\n\nWrite a 3-paragraph explanation.",
        }],
    )
    return response.content[0].text


topic = "how transformer attention mechanisms work"
facts = researcher(topic)
article = writer(topic, facts)
print(article)

Orchestrator Pattern

import anthropic
from dataclasses import dataclass

@dataclass
class AgentResult:
    agent: str
    output: str

client = anthropic.Anthropic()


def run_agent(name: str, system: str, user: str, max_tokens: int = 1024) -> AgentResult:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=max_tokens,
        system=system,
        messages=[{"role": "user", "content": user}],
    )
    return AgentResult(agent=name, output=response.content[0].text)


def orchestrator(task: str) -> str:
    research = run_agent("Researcher", "Return bullet-point facts only.", f"Research: {task}")
    draft = run_agent(
        "Writer", "Write clearly for developers.",
        f"Task: {task}\n\nResearch:\n{research.output}\n\nWrite a draft.",
        max_tokens=2048,
    )
    final = run_agent(
        "Editor", "Fix clarity, remove redundancy. Return only the improved text.",
        f"Edit:\n\n{draft.output}",
        max_tokens=2048,
    )
    return final.output

Parallel Fan-Out

from concurrent.futures import ThreadPoolExecutor, as_completed

def research_source(source_name: str, topic: str) -> tuple[str, str]:
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=512,
        messages=[{"role": "user", "content": f"You are a {source_name} expert. Give 3 bullet points about: {topic}"}],
    )
    return source_name, response.content[0].text


def parallel_research(topic: str, sources: list[str]) -> dict[str, str]:
    results = {}
    with ThreadPoolExecutor(max_workers=len(sources)) as pool:
        futures = {pool.submit(research_source, s, topic): s for s in sources}
        for future in as_completed(futures):
            name, output = future.result()
            results[name] = output
    return results


research = parallel_research("LLMs in production", ["DevOps", "ML Engineering", "Security"])

Agent Handoffs with State

from dataclasses import dataclass, field

@dataclass
class PipelineState:
    topic: str
    research: str = ""
    outline: str = ""
    draft: str = ""
    errors: list[str] = field(default_factory=list)


def research_agent(state: PipelineState) -> PipelineState:
    try:
        resp = client.messages.create(
            model="claude-haiku-4-5", max_tokens=512,
            messages=[{"role": "user", "content": f"Research: {state.topic}"}],
        )
        state.research = resp.content[0].text
    except Exception as e:
        state.errors.append(f"research: {e}")
    return state

When to Use Multi-Agent vs Single Agent

Scenario	Approach
Task fits in one prompt	Single agent
Clear sequential steps	Pipeline
Complex, dynamic steps	Orchestrator
Independent sub-tasks	Parallel fan-out

Cost Tips

Use claude-haiku-4-5 for worker agents, claude-sonnet-4-6 for synthesis
Parallelize independent agents to cut wall-clock time
Enable prompt caching for shared system prompts
Cap per-agent max_tokens — researchers don't need 4096

Summary

Multi-agent systems let you build AI pipelines that are smarter, faster, and easier to debug:

Split work by skill: researcher, writer, editor
Run independent agents in parallel with ThreadPoolExecutor
Pass structured state between agents
Use cheap models for workers, powerful models for synthesis
Fail fast and log errors per agent

DEV Community