DEV Community

Cover image for Agent Series (9): Multi-Agent Architecture Design Patterns — Supervisor vs Pipeline
WonderLab
WonderLab

Posted on

Agent Series (9): Multi-Agent Architecture Design Patterns — Supervisor vs Pipeline

When One Agent Isn't Enough

The previous eight articles built single-agent systems: one LLM, one set of tools, one conversation history. That architecture handles most problems well.

But some tasks are inherently multi-expert:

  • Writing a technical article needs a researcher to gather facts, a writer to draft, and an editor to polish — three roles, three ways of thinking
  • Handling a support ticket needs intent classification, knowledge base lookup, and reply generation — three stages, independently testable
  • Code review needs static analysis, security scanning, and readability review — three dimensions, none interfering with the others

A single agent can handle these, but you'll find the System Prompt growing uncontrollably and output quality becoming erratic — because you're forcing one role to play everyone.

The core value of multi-agent: separation of concerns. Each agent does one thing well.


Two Common Architecture Patterns

Multi-agent systems have several topologies. Two dominate in practice:

Supervisor Pattern (dynamic routing):
  classify → supervisor → researcher
                       ↘ writer
                       ↘ reviewer
                       ↘ FINISH

  Key: one "control center" decides which agent to call next

Pipeline Pattern (fixed sequence):
  outline_agent → draft_agent → polish_agent → END

  Key: execution path is hardwired; each agent sees only its own context
Enter fullscreen mode Exit fullscreen mode

These patterns aren't competitors — they fit different scenarios.


Demo 1: Supervisor Pattern

Design Approach

The core challenge of the Supervisor pattern is routing reliability. If the LLM decides "who to call next" at every step, it tends to:

  • Call the same worker multiple times
  • Forget which workers have already been called
  • Fail to recognize when to terminate

A better design is a two-phase hybrid:

Phase 1: LLM classifies the task once (simple_fact vs full_article)
Phase 2: Python routes deterministically based on classification + called list
Enter fullscreen mode Exit fullscreen mode

LLM handles "understanding what kind of task this is." Python handles "executing the right sequence." Each does what it's good at.

LangGraph Implementation

class SupervisorState(TypedDict):
    messages: Annotated[list, add_messages]
    task: str
    task_type: str   # "simple_fact" or "full_article"
    called: list[str]
    next: str


def classify_node(state: SupervisorState) -> SupervisorState:
    """LLM classifies once; result persists in state for all subsequent routing."""
    decision = _ask(
        "Classify this task:\n"
        "  simple_fact  — a factual question with a direct short answer\n"
        "  full_article — needs research, writing, and editorial review\n"
        "Output one word only: simple_fact / full_article",
        f"Task: {state['task']}",
    ).strip().lower()
    task_type = "full_article" if "full_article" in decision else "simple_fact"
    return {**state, "task_type": task_type}


def supervisor_node(state: SupervisorState) -> SupervisorState:
    """Pure Python routing — no LLM call, no risk of infinite loops."""
    called = state["called"]
    task_type = state["task_type"]

    if "researcher" not in called:
        next_worker = "researcher"
    elif task_type == "simple_fact":
        next_worker = "FINISH"          # simple questions stop after research
    elif "writer" not in called:
        next_worker = "writer"
    elif "reviewer" not in called:
        next_worker = "reviewer"
    else:
        next_worker = "FINISH"

    return {**state, "next": next_worker}
Enter fullscreen mode Exit fullscreen mode

Graph Topology

g = StateGraph(SupervisorState)
g.set_entry_point("classify")
g.add_edge("classify", "supervisor")
g.add_conditional_edges(
    "supervisor",
    route_supervisor,
    {"researcher": "researcher", "writer": "writer",
     "reviewer": "reviewer", "FINISH": END},
)
g.add_edge("researcher", "supervisor")
g.add_edge("writer", "supervisor")
g.add_edge("reviewer", "supervisor")
Enter fullscreen mode Exit fullscreen mode

classify → supervisor → [workers] → supervisor → ... → FINISH forms a controlled loop. The classify node runs exactly once; the supervisor node acts as a lightweight state machine.

Measured Results

Task: "Write a short article about Python list comprehensions"

[classify] task_type = full_article
[supervisor] → researcher
[researcher] working...
[supervisor] → writer
[writer] working...
[supervisor] → reviewer
[reviewer] working...
[supervisor] → FINISH

Workers called: ['researcher', 'writer', 'reviewer']
Task type     : full_article
Enter fullscreen mode Exit fullscreen mode

One LLM call for classify, then pure Python routing. The execution chain is clean and predictable: researcher → writer → reviewer → FINISH.


Demo 2: Pipeline Pattern

Pipeline requires significantly less code than Supervisor — there's no routing logic to write:

class PipelineState(TypedDict):
    topic: str
    outline: str
    draft: str
    polished: str
    stage_log: list[str]


def outline_agent(state: PipelineState) -> PipelineState:
    outline = _ask("Create a 5-point outline...", state["topic"])
    return {**state, "outline": outline, "stage_log": [...]}

def draft_agent(state: PipelineState) -> PipelineState:
    draft = _ask("Expand the outline into a 200-word draft...", state["outline"])
    return {**state, "draft": draft, "stage_log": [...]}

def polish_agent(state: PipelineState) -> PipelineState:
    polished = _ask("Polish the draft...", state["draft"])
    return {**state, "polished": polished, "stage_log": [...]}

# Topology: outline → draft → polish → END
g.add_edge("outline_agent", "draft_agent")
g.add_edge("draft_agent", "polish_agent")
g.add_edge("polish_agent", END)
Enter fullscreen mode Exit fullscreen mode

Measured Results

[outline_agent] 957 chars
[draft_agent] 1846 chars
[polish_agent] 2168 chars

Final output (first 300 chars):
"### Unveiling the Power of List Comprehensions in Python
Python's lists are dynamic and powerful data containers..."
Enter fullscreen mode Exit fullscreen mode

Each stage's output becomes the next stage's input. Content grows progressively: outline 957 chars → draft 1846 → polished 2168.

No LLM routing decisions at all — the path was determined when the graph was written.


Demo 3: Same Graph, Different Paths

The most compelling advantage of the Supervisor pattern: different task types take different execution paths through the same graph — no code changes needed.

Running the same Supervisor graph on a simple factual question:

Task: "What year was Python created?"

[classify] task_type = simple_fact
[supervisor] → researcher
[researcher] working...
[supervisor] → FINISH

Workers called : ['researcher']
Task type      : simple_fact
Result         : researcher → FINISH  (writer + reviewer skipped)
Enter fullscreen mode Exit fullscreen mode

Side-by-side comparison:

Task                                  Workers Called                Steps
──────────────────────────────────────────────────────────────────────────
Write article (full_article)          researcher→writer→reviewer    3
Factual question (simple_fact)        researcher                    1

Same Supervisor graph. Different paths based on LLM classification.
Enter fullscreen mode Exit fullscreen mode

Pipeline can't do this — its path is hardcoded.


Pattern Selection Matrix

Dimension           Pipeline                    Supervisor
─────────────────────────────────────────────────────────────────────
Execution path      Fixed, hardwired            Dynamic, classification-driven
Best for            ETL, doc processing         Research, open Q&A, mixed tasks
Debuggability       High (linear trace)         Medium (path varies per task)
LLM calls/turn      N (one per stage)           N + 1 (one classify call extra)
Flexibility         Low                         High
Predictability      High                        Lower
Implementation      Trivial                     Medium
Enter fullscreen mode Exit fullscreen mode

Rule of thumb:

  • Know exactly what steps you need → Pipeline
  • Need to adapt the steps per task → Supervisor

Multi-Agent Design Checklist

Pattern Selection

  • [ ] Fixed, known steps → Pipeline; dynamic decision needed → Supervisor
  • [ ] With ≤ 3 workers and clear responsibilities, either pattern works
  • [ ] With > 5 workers, consider hierarchical Supervisor (Supervisor of Supervisors)

State Design

  • [ ] Each worker reads only the fields it needs, writes only the fields it produces
  • [ ] Supervisor state must include called: list[str] to prevent duplicate invocations
  • [ ] Pipeline state uses stage-named fields (outline, draft, polished) for easy debugging

Routing Reliability

  • [ ] Avoid pure LLM routing (LLMs cannot reliably track call history)
  • [ ] Recommended: LLM for one-time classification + Python for deterministic routing
  • [ ] Set recursion_limit (20–30 is a good range) to guard against accidental loops

Worker Design

  • [ ] Each worker does exactly one thing, with clear input/output contract
  • [ ] Workers communicate via State, never by calling each other directly
  • [ ] Write clear Worker System Prompts — don't make workers guess the context

Observability

  • [ ] Log each node execution (worker name, input/output summary)
  • [ ] Record the called list to trace routing decisions post-hoc
  • [ ] Add warning logs on unusual branches (e.g., premature FINISH)

Summary

Five core takeaways:

  1. Multi-agent is about separation of concerns: not for complexity's sake, but because a single Agent's System Prompt starts breaking down when it has to play too many roles
  2. Pipeline wins on simplicity and predictability: the execution path lives in code, traces are linear, testing and debugging cost is minimized
  3. Supervisor wins on adaptability: the same graph handles a one-step factual question and a three-step full article — no code change required
  4. LLM classification + Python execution is the best pairing: LLM does what it's good at (understanding task type), Python does what it's good at (reliable sequencing)
  5. called list is the critical field in Supervisor State: the foundation of routing determinism — without it, Supervisor is prone to duplicate calls and infinite loops

References


Find more useful knowledge and interesting products on my Homepage

Top comments (0)