WonderLab

Posted on Jun 1

Agent Series (9): Multi-Agent Architecture Design Patterns — Supervisor vs Pipeline

#ai #agents #multiagent #langchain

When One Agent Isn't Enough

The previous eight articles built single-agent systems: one LLM, one set of tools, one conversation history. That architecture handles most problems well.

But some tasks are inherently multi-expert:

Writing a technical article needs a researcher to gather facts, a writer to draft, and an editor to polish — three roles, three ways of thinking
Handling a support ticket needs intent classification, knowledge base lookup, and reply generation — three stages, independently testable
Code review needs static analysis, security scanning, and readability review — three dimensions, none interfering with the others

A single agent can handle these, but you'll find the System Prompt growing uncontrollably and output quality becoming erratic — because you're forcing one role to play everyone.

The core value of multi-agent: separation of concerns. Each agent does one thing well.

Two Common Architecture Patterns

Multi-agent systems have several topologies. Two dominate in practice:

Supervisor Pattern (dynamic routing):
  classify → supervisor → researcher
                       ↘ writer
                       ↘ reviewer
                       ↘ FINISH

  Key: one "control center" decides which agent to call next

Pipeline Pattern (fixed sequence):
  outline_agent → draft_agent → polish_agent → END

  Key: execution path is hardwired; each agent sees only its own context

These patterns aren't competitors — they fit different scenarios.

Demo 1: Supervisor Pattern

Design Approach

The core challenge of the Supervisor pattern is routing reliability. If the LLM decides "who to call next" at every step, it tends to:

Call the same worker multiple times
Forget which workers have already been called
Fail to recognize when to terminate

A better design is a two-phase hybrid:

Phase 1: LLM classifies the task once (simple_fact vs full_article)
Phase 2: Python routes deterministically based on classification + called list

LLM handles "understanding what kind of task this is." Python handles "executing the right sequence." Each does what it's good at.

LangGraph Implementation

class SupervisorState(TypedDict):
    messages: Annotated[list, add_messages]
    task: str
    task_type: str   # "simple_fact" or "full_article"
    called: list[str]
    next: str


def classify_node(state: SupervisorState) -> SupervisorState:
    """LLM classifies once; result persists in state for all subsequent routing."""
    decision = _ask(
        "Classify this task:\n"
        "  simple_fact  — a factual question with a direct short answer\n"
        "  full_article — needs research, writing, and editorial review\n"
        "Output one word only: simple_fact / full_article",
        f"Task: {state['task']}",
    ).strip().lower()
    task_type = "full_article" if "full_article" in decision else "simple_fact"
    return {**state, "task_type": task_type}


def supervisor_node(state: SupervisorState) -> SupervisorState:
    """Pure Python routing — no LLM call, no risk of infinite loops."""
    called = state["called"]
    task_type = state["task_type"]

    if "researcher" not in called:
        next_worker = "researcher"
    elif task_type == "simple_fact":
        next_worker = "FINISH"          # simple questions stop after research
    elif "writer" not in called:
        next_worker = "writer"
    elif "reviewer" not in called:
        next_worker = "reviewer"
    else:
        next_worker = "FINISH"

    return {**state, "next": next_worker}

Graph Topology

g = StateGraph(SupervisorState)
g.set_entry_point("classify")
g.add_edge("classify", "supervisor")
g.add_conditional_edges(
    "supervisor",
    route_supervisor,
    {"researcher": "researcher", "writer": "writer",
     "reviewer": "reviewer", "FINISH": END},
)
g.add_edge("researcher", "supervisor")
g.add_edge("writer", "supervisor")
g.add_edge("reviewer", "supervisor")

classify → supervisor → [workers] → supervisor → ... → FINISH forms a controlled loop. The classify node runs exactly once; the supervisor node acts as a lightweight state machine.

Measured Results

Task: "Write a short article about Python list comprehensions"

[classify] task_type = full_article
[supervisor] → researcher
[researcher] working...
[supervisor] → writer
[writer] working...
[supervisor] → reviewer
[reviewer] working...
[supervisor] → FINISH

Workers called: ['researcher', 'writer', 'reviewer']
Task type     : full_article

One LLM call for classify, then pure Python routing. The execution chain is clean and predictable: researcher → writer → reviewer → FINISH.

Demo 2: Pipeline Pattern

Pipeline requires significantly less code than Supervisor — there's no routing logic to write:

class PipelineState(TypedDict):
    topic: str
    outline: str
    draft: str
    polished: str
    stage_log: list[str]


def outline_agent(state: PipelineState) -> PipelineState:
    outline = _ask("Create a 5-point outline...", state["topic"])
    return {**state, "outline": outline, "stage_log": [...]}

def draft_agent(state: PipelineState) -> PipelineState:
    draft = _ask("Expand the outline into a 200-word draft...", state["outline"])
    return {**state, "draft": draft, "stage_log": [...]}

def polish_agent(state: PipelineState) -> PipelineState:
    polished = _ask("Polish the draft...", state["draft"])
    return {**state, "polished": polished, "stage_log": [...]}

# Topology: outline → draft → polish → END
g.add_edge("outline_agent", "draft_agent")
g.add_edge("draft_agent", "polish_agent")
g.add_edge("polish_agent", END)

Measured Results

[outline_agent] 957 chars
[draft_agent] 1846 chars
[polish_agent] 2168 chars

Final output (first 300 chars):
"### Unveiling the Power of List Comprehensions in Python
Python's lists are dynamic and powerful data containers..."

Each stage's output becomes the next stage's input. Content grows progressively: outline 957 chars → draft 1846 → polished 2168.

No LLM routing decisions at all — the path was determined when the graph was written.

Demo 3: Same Graph, Different Paths

The most compelling advantage of the Supervisor pattern: different task types take different execution paths through the same graph — no code changes needed.

Running the same Supervisor graph on a simple factual question:

Task: "What year was Python created?"

[classify] task_type = simple_fact
[supervisor] → researcher
[researcher] working...
[supervisor] → FINISH

Workers called : ['researcher']
Task type      : simple_fact
Result         : researcher → FINISH  (writer + reviewer skipped)

Side-by-side comparison:

Task                                  Workers Called                Steps
──────────────────────────────────────────────────────────────────────────
Write article (full_article)          researcher→writer→reviewer    3
Factual question (simple_fact)        researcher                    1

Same Supervisor graph. Different paths based on LLM classification.

Pipeline can't do this — its path is hardcoded.

Pattern Selection Matrix

Dimension           Pipeline                    Supervisor
─────────────────────────────────────────────────────────────────────
Execution path      Fixed, hardwired            Dynamic, classification-driven
Best for            ETL, doc processing         Research, open Q&A, mixed tasks
Debuggability       High (linear trace)         Medium (path varies per task)
LLM calls/turn      N (one per stage)           N + 1 (one classify call extra)
Flexibility         Low                         High
Predictability      High                        Lower
Implementation      Trivial                     Medium

Rule of thumb:

Know exactly what steps you need → Pipeline

Need to adapt the steps per task → Supervisor

Multi-Agent Design Checklist

Pattern Selection

[ ] Fixed, known steps → Pipeline; dynamic decision needed → Supervisor
[ ] With ≤ 3 workers and clear responsibilities, either pattern works
[ ] With > 5 workers, consider hierarchical Supervisor (Supervisor of Supervisors)

State Design

[ ] Each worker reads only the fields it needs, writes only the fields it produces
[ ] Supervisor state must include called: list[str] to prevent duplicate invocations
[ ] Pipeline state uses stage-named fields (outline, draft, polished) for easy debugging

Routing Reliability

[ ] Avoid pure LLM routing (LLMs cannot reliably track call history)
[ ] Recommended: LLM for one-time classification + Python for deterministic routing
[ ] Set recursion_limit (20–30 is a good range) to guard against accidental loops

Worker Design

[ ] Each worker does exactly one thing, with clear input/output contract
[ ] Workers communicate via State, never by calling each other directly
[ ] Write clear Worker System Prompts — don't make workers guess the context

Observability

[ ] Log each node execution (worker name, input/output summary)
[ ] Record the called list to trace routing decisions post-hoc
[ ] Add warning logs on unusual branches (e.g., premature FINISH)

Summary

Five core takeaways:

Multi-agent is about separation of concerns: not for complexity's sake, but because a single Agent's System Prompt starts breaking down when it has to play too many roles
Pipeline wins on simplicity and predictability: the execution path lives in code, traces are linear, testing and debugging cost is minimized
Supervisor wins on adaptability: the same graph handles a one-step factual question and a three-step full article — no code change required
LLM classification + Python execution is the best pairing: LLM does what it's good at (understanding task type), Python does what it's good at (reliable sequencing)
called list is the critical field in Supervisor State: the foundation of routing determinism — without it, Supervisor is prone to duplicate calls and infinite loops

References

LangGraph Multi-Agent Concepts
LangGraph Supervisor Tutorial
Full demo code for this series: agent-08-multi-agent

Find more useful knowledge and interesting products on my Homepage

Top comments (3)

Becomer.net • Jun 2

Good comparison of the patterns. One thing both architectures hit eventually: how do agents share state without passing it through every message?

Supervisor pattern especially — if the supervisor crashes or restarts, agents lose shared context.

The pattern that works for us:

from becomer_agents import MultiAgentPipeline

pipeline = MultiAgentPipeline(
    api_key="bcm_key",
    task_id="task-001",
    agents=["researcher", "analyst", "writer"]
)

# Researcher stores — analyst reads without message passing
pipeline.researcher.store("Found: API rate limit is 100/min, OAuth2 required")
pipeline.shared.store("Decision: use streaming for responses > 500 tokens")

# Next run — or after a crash — full context available
findings = pipeline.analyst.recall("what constraints did researcher find?")

Each agent has its own namespace. Shared namespace for cross-agent coordination.
Open source: pip install becomer-agents

Mehmet Can Farsak • Jun 14

Great breakdown of supervisor vs pipeline patterns for multi-agent systems. One gap I've seen even with clear role separation: agents still blur the line between thinking and acting. I put together Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) — it uses PreToolUse hooks to block tool calls during brainstorming phases. Three modes (divergent, actionable, academic) keep each agent in the right headspace within your architecture.

Mehmet Can Farsak • Jun 14

Great breakdown of supervisor vs pipeline patterns. One thing I've noticed in multi-agent setups: agents don't have a 'thinking mode' vs 'action mode' toggle. That's why I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) — it adds three modes (divergent, actionable, academic) via PreToolUse hooks so the agent stays in brainstorming instead of jumping to code. Fits nicely as a supplementary pattern alongside the supervisor/pipeline approaches.