A multi-agent system is a group of AI agents that each handle a specific task and pass results to one another. Instead of prompting one model to do everything, you split the work — a researcher finds facts, a writer drafts content, an editor reviews it.
Originally published at kalyna.pro
Why Multi-Agent?
Single-agent LLM calls hit limits quickly:
- Context window overflow — one agent can't hold a 200-page report and write a summary simultaneously
- Quality degradation — researching AND writing AND fact-checking in one prompt produces mediocre results
- No parallelism — agents can run simultaneously; sequential prompts cannot
- Hard to debug — when one big prompt fails, you don't know which step broke
Core Architecture Patterns
1. Orchestrator → Workers
Orchestrator
├── Research Agent → gathers data
├── Analysis Agent → interprets data
└── Writer Agent → produces output
2. Pipeline (Sequential)
Input → Researcher → Drafter → Editor → Output
3. Parallel Fan-Out
┌─ Agent A ─┐
Input ───┤─ Agent B ─├─── Merger ── Output
└─ Agent C ─┘
Simple Two-Agent Pipeline
import anthropic
client = anthropic.Anthropic()
def researcher(topic: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a technical researcher. Return 5-7 concise bullet points.",
messages=[{"role": "user", "content": f"Research topic: {topic}"}],
)
return response.content[0].text
def writer(topic: str, research: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are a technical writer for developers. Write clear, practical prose.",
messages=[{
"role": "user",
"content": f"Topic: {topic}\n\nResearch:\n{research}\n\nWrite a 3-paragraph explanation.",
}],
)
return response.content[0].text
topic = "how transformer attention mechanisms work"
facts = researcher(topic)
article = writer(topic, facts)
print(article)
Orchestrator Pattern
import anthropic
from dataclasses import dataclass
@dataclass
class AgentResult:
agent: str
output: str
client = anthropic.Anthropic()
def run_agent(name: str, system: str, user: str, max_tokens: int = 1024) -> AgentResult:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=max_tokens,
system=system,
messages=[{"role": "user", "content": user}],
)
return AgentResult(agent=name, output=response.content[0].text)
def orchestrator(task: str) -> str:
research = run_agent("Researcher", "Return bullet-point facts only.", f"Research: {task}")
draft = run_agent(
"Writer", "Write clearly for developers.",
f"Task: {task}\n\nResearch:\n{research.output}\n\nWrite a draft.",
max_tokens=2048,
)
final = run_agent(
"Editor", "Fix clarity, remove redundancy. Return only the improved text.",
f"Edit:\n\n{draft.output}",
max_tokens=2048,
)
return final.output
Parallel Fan-Out
from concurrent.futures import ThreadPoolExecutor, as_completed
def research_source(source_name: str, topic: str) -> tuple[str, str]:
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=512,
messages=[{"role": "user", "content": f"You are a {source_name} expert. Give 3 bullet points about: {topic}"}],
)
return source_name, response.content[0].text
def parallel_research(topic: str, sources: list[str]) -> dict[str, str]:
results = {}
with ThreadPoolExecutor(max_workers=len(sources)) as pool:
futures = {pool.submit(research_source, s, topic): s for s in sources}
for future in as_completed(futures):
name, output = future.result()
results[name] = output
return results
research = parallel_research("LLMs in production", ["DevOps", "ML Engineering", "Security"])
Agent Handoffs with State
from dataclasses import dataclass, field
@dataclass
class PipelineState:
topic: str
research: str = ""
outline: str = ""
draft: str = ""
errors: list[str] = field(default_factory=list)
def research_agent(state: PipelineState) -> PipelineState:
try:
resp = client.messages.create(
model="claude-haiku-4-5", max_tokens=512,
messages=[{"role": "user", "content": f"Research: {state.topic}"}],
)
state.research = resp.content[0].text
except Exception as e:
state.errors.append(f"research: {e}")
return state
When to Use Multi-Agent vs Single Agent
| Scenario | Approach |
|---|---|
| Task fits in one prompt | Single agent |
| Clear sequential steps | Pipeline |
| Complex, dynamic steps | Orchestrator |
| Independent sub-tasks | Parallel fan-out |
Cost Tips
- Use
claude-haiku-4-5for worker agents,claude-sonnet-4-6for synthesis - Parallelize independent agents to cut wall-clock time
- Enable prompt caching for shared system prompts
- Cap per-agent
max_tokens— researchers don't need 4096
Summary
Multi-agent systems let you build AI pipelines that are smarter, faster, and easier to debug:
- Split work by skill: researcher, writer, editor
- Run independent agents in parallel with
ThreadPoolExecutor - Pass structured state between agents
- Use cheap models for workers, powerful models for synthesis
- Fail fast and log errors per agent
Related: How to Build an AI Agent with Python and LangChain for Beginners.
Top comments (0)