When One Agent Isn't Enough
The previous eight articles built single-agent systems: one LLM, one set of tools, one conversation history. That architecture handles most problems well.
But some tasks are inherently multi-expert:
- Writing a technical article needs a researcher to gather facts, a writer to draft, and an editor to polish — three roles, three ways of thinking
- Handling a support ticket needs intent classification, knowledge base lookup, and reply generation — three stages, independently testable
- Code review needs static analysis, security scanning, and readability review — three dimensions, none interfering with the others
A single agent can handle these, but you'll find the System Prompt growing uncontrollably and output quality becoming erratic — because you're forcing one role to play everyone.
The core value of multi-agent: separation of concerns. Each agent does one thing well.
Two Common Architecture Patterns
Multi-agent systems have several topologies. Two dominate in practice:
Supervisor Pattern (dynamic routing):
classify → supervisor → researcher
↘ writer
↘ reviewer
↘ FINISH
Key: one "control center" decides which agent to call next
Pipeline Pattern (fixed sequence):
outline_agent → draft_agent → polish_agent → END
Key: execution path is hardwired; each agent sees only its own context
These patterns aren't competitors — they fit different scenarios.
Demo 1: Supervisor Pattern
Design Approach
The core challenge of the Supervisor pattern is routing reliability. If the LLM decides "who to call next" at every step, it tends to:
- Call the same worker multiple times
- Forget which workers have already been called
- Fail to recognize when to terminate
A better design is a two-phase hybrid:
Phase 1: LLM classifies the task once (simple_fact vs full_article)
Phase 2: Python routes deterministically based on classification + called list
LLM handles "understanding what kind of task this is." Python handles "executing the right sequence." Each does what it's good at.
LangGraph Implementation
class SupervisorState(TypedDict):
messages: Annotated[list, add_messages]
task: str
task_type: str # "simple_fact" or "full_article"
called: list[str]
next: str
def classify_node(state: SupervisorState) -> SupervisorState:
"""LLM classifies once; result persists in state for all subsequent routing."""
decision = _ask(
"Classify this task:\n"
" simple_fact — a factual question with a direct short answer\n"
" full_article — needs research, writing, and editorial review\n"
"Output one word only: simple_fact / full_article",
f"Task: {state['task']}",
).strip().lower()
task_type = "full_article" if "full_article" in decision else "simple_fact"
return {**state, "task_type": task_type}
def supervisor_node(state: SupervisorState) -> SupervisorState:
"""Pure Python routing — no LLM call, no risk of infinite loops."""
called = state["called"]
task_type = state["task_type"]
if "researcher" not in called:
next_worker = "researcher"
elif task_type == "simple_fact":
next_worker = "FINISH" # simple questions stop after research
elif "writer" not in called:
next_worker = "writer"
elif "reviewer" not in called:
next_worker = "reviewer"
else:
next_worker = "FINISH"
return {**state, "next": next_worker}
Graph Topology
g = StateGraph(SupervisorState)
g.set_entry_point("classify")
g.add_edge("classify", "supervisor")
g.add_conditional_edges(
"supervisor",
route_supervisor,
{"researcher": "researcher", "writer": "writer",
"reviewer": "reviewer", "FINISH": END},
)
g.add_edge("researcher", "supervisor")
g.add_edge("writer", "supervisor")
g.add_edge("reviewer", "supervisor")
classify → supervisor → [workers] → supervisor → ... → FINISH forms a controlled loop. The classify node runs exactly once; the supervisor node acts as a lightweight state machine.
Measured Results
Task: "Write a short article about Python list comprehensions"
[classify] task_type = full_article
[supervisor] → researcher
[researcher] working...
[supervisor] → writer
[writer] working...
[supervisor] → reviewer
[reviewer] working...
[supervisor] → FINISH
Workers called: ['researcher', 'writer', 'reviewer']
Task type : full_article
One LLM call for classify, then pure Python routing. The execution chain is clean and predictable: researcher → writer → reviewer → FINISH.
Demo 2: Pipeline Pattern
Pipeline requires significantly less code than Supervisor — there's no routing logic to write:
class PipelineState(TypedDict):
topic: str
outline: str
draft: str
polished: str
stage_log: list[str]
def outline_agent(state: PipelineState) -> PipelineState:
outline = _ask("Create a 5-point outline...", state["topic"])
return {**state, "outline": outline, "stage_log": [...]}
def draft_agent(state: PipelineState) -> PipelineState:
draft = _ask("Expand the outline into a 200-word draft...", state["outline"])
return {**state, "draft": draft, "stage_log": [...]}
def polish_agent(state: PipelineState) -> PipelineState:
polished = _ask("Polish the draft...", state["draft"])
return {**state, "polished": polished, "stage_log": [...]}
# Topology: outline → draft → polish → END
g.add_edge("outline_agent", "draft_agent")
g.add_edge("draft_agent", "polish_agent")
g.add_edge("polish_agent", END)
Measured Results
[outline_agent] 957 chars
[draft_agent] 1846 chars
[polish_agent] 2168 chars
Final output (first 300 chars):
"### Unveiling the Power of List Comprehensions in Python
Python's lists are dynamic and powerful data containers..."
Each stage's output becomes the next stage's input. Content grows progressively: outline 957 chars → draft 1846 → polished 2168.
No LLM routing decisions at all — the path was determined when the graph was written.
Demo 3: Same Graph, Different Paths
The most compelling advantage of the Supervisor pattern: different task types take different execution paths through the same graph — no code changes needed.
Running the same Supervisor graph on a simple factual question:
Task: "What year was Python created?"
[classify] task_type = simple_fact
[supervisor] → researcher
[researcher] working...
[supervisor] → FINISH
Workers called : ['researcher']
Task type : simple_fact
Result : researcher → FINISH (writer + reviewer skipped)
Side-by-side comparison:
Task Workers Called Steps
──────────────────────────────────────────────────────────────────────────
Write article (full_article) researcher→writer→reviewer 3
Factual question (simple_fact) researcher 1
Same Supervisor graph. Different paths based on LLM classification.
Pipeline can't do this — its path is hardcoded.
Pattern Selection Matrix
Dimension Pipeline Supervisor
─────────────────────────────────────────────────────────────────────
Execution path Fixed, hardwired Dynamic, classification-driven
Best for ETL, doc processing Research, open Q&A, mixed tasks
Debuggability High (linear trace) Medium (path varies per task)
LLM calls/turn N (one per stage) N + 1 (one classify call extra)
Flexibility Low High
Predictability High Lower
Implementation Trivial Medium
Rule of thumb:
- Know exactly what steps you need → Pipeline
- Need to adapt the steps per task → Supervisor
Multi-Agent Design Checklist
Pattern Selection
- [ ] Fixed, known steps → Pipeline; dynamic decision needed → Supervisor
- [ ] With ≤ 3 workers and clear responsibilities, either pattern works
- [ ] With > 5 workers, consider hierarchical Supervisor (Supervisor of Supervisors)
State Design
- [ ] Each worker reads only the fields it needs, writes only the fields it produces
- [ ] Supervisor state must include
called: list[str]to prevent duplicate invocations - [ ] Pipeline state uses stage-named fields (
outline,draft,polished) for easy debugging
Routing Reliability
- [ ] Avoid pure LLM routing (LLMs cannot reliably track call history)
- [ ] Recommended: LLM for one-time classification + Python for deterministic routing
- [ ] Set
recursion_limit(20–30 is a good range) to guard against accidental loops
Worker Design
- [ ] Each worker does exactly one thing, with clear input/output contract
- [ ] Workers communicate via State, never by calling each other directly
- [ ] Write clear Worker System Prompts — don't make workers guess the context
Observability
- [ ] Log each node execution (worker name, input/output summary)
- [ ] Record the
calledlist to trace routing decisions post-hoc - [ ] Add warning logs on unusual branches (e.g., premature FINISH)
Summary
Five core takeaways:
- Multi-agent is about separation of concerns: not for complexity's sake, but because a single Agent's System Prompt starts breaking down when it has to play too many roles
- Pipeline wins on simplicity and predictability: the execution path lives in code, traces are linear, testing and debugging cost is minimized
- Supervisor wins on adaptability: the same graph handles a one-step factual question and a three-step full article — no code change required
- LLM classification + Python execution is the best pairing: LLM does what it's good at (understanding task type), Python does what it's good at (reliable sequencing)
-
calledlist is the critical field in Supervisor State: the foundation of routing determinism — without it, Supervisor is prone to duplicate calls and infinite loops
References
- LangGraph Multi-Agent Concepts
- LangGraph Supervisor Tutorial
- Full demo code for this series: agent-08-multi-agent
Find more useful knowledge and interesting products on my Homepage
Top comments (0)