Akhilesh

Posted on May 30

102. Multi-Agent Systems: When One Agent Is Not Enough

#multiagent #tools #ai #beginners

One agent is powerful but limited.

Ask it to research a topic, write an article, review that article, check the code examples, and format everything for publishing. It has to do everything sequentially. When it makes a mistake in step 2, it might not catch it until step 7. It has one perspective. One "voice." One set of strengths and weaknesses.

Now imagine three specialized agents working on the same task. A research agent that searches exhaustively and compiles sources. A writing agent that takes those sources and drafts the article with a clear structure. A review agent that reads the draft critically and flags errors, gaps, and unsupported claims. Each one knows its job deeply. They check each other's work. They have different system prompts that give them different strengths.

This is how complex knowledge work actually gets done. Not one person doing everything. A team of specialists coordinated toward a shared goal.

Multi-agent systems bring this pattern to AI.

The Core Patterns

import os
import json
import time
from typing import List, Dict, Callable, Optional, Any
from dataclasses import dataclass, field
from enum import Enum
import anthropic

client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

class Pattern(Enum):
    ORCHESTRATOR_WORKER = "orchestrator_worker"
    SEQUENTIAL_PIPELINE = "sequential_pipeline"
    PARALLEL_EXECUTION  = "parallel_execution"
    DEBATE              = "debate"
    CRITIC_REVIEW       = "critic_review"

print("Multi-Agent Patterns:")
print()

patterns = {
    "Orchestrator-Worker": {
        "description": "One LLM breaks down tasks, delegates to specialized workers, aggregates results",
        "best_for":    "Complex tasks that can be decomposed into subtasks",
        "example":     "Research assistant: orchestrator delegates to researcher, writer, editor"
    },
    "Sequential Pipeline": {
        "description": "Output of one agent becomes input to the next in a fixed chain",
        "best_for":    "Multi-stage transformation: draft → edit → format → publish",
        "example":     "Content pipeline: researcher → writer → fact-checker → publisher"
    },
    "Parallel Execution": {
        "description": "Multiple agents work simultaneously on independent subtasks",
        "best_for":    "Tasks with independent components that can run concurrently",
        "example":     "Market research: agent A covers Asia, agent B covers Europe simultaneously"
    },
    "Debate/Adversarial": {
        "description": "Two agents argue opposing positions, a judge evaluates and decides",
        "best_for":    "Decision-making, fact-checking, reducing overconfidence",
        "example":     "Agent A argues for approach X, Agent B argues against, judge decides"
    },
    "Critic-Review": {
        "description": "Creator agent produces output, critic agent evaluates and gives feedback",
        "best_for":    "Quality assurance, catching blind spots, improving output quality",
        "example":     "Writer produces article, critic identifies weaknesses, writer revises"
    },
}

for name, info in patterns.items():
    print(f"  {name}:")
    print(f"    {info['description']}")
    print(f"    Best for: {info['best_for']}")
    print(f"    Example:  {info['example']}")
    print()

Building a Base Agent Class

@dataclass
class AgentMessage:
    from_agent: str
    to_agent:   str
    content:    str
    message_type: str = "task"
    metadata:   Dict  = field(default_factory=dict)

class BaseAgent:
    """Foundation agent that all specialized agents inherit from."""

    def __init__(self, name: str, role: str, system_prompt: str,
                 model: str = "claude-3-5-haiku-20241022",
                 tools: List[Dict] = None):
        self.name          = name
        self.role          = role
        self.system_prompt = system_prompt
        self.model         = model
        self.tools         = tools or []
        self.history:List[AgentMessage] = []

    def think(self, message: str,
               context: List[Dict] = None,
               max_tokens: int = 1000) -> str:
        messages = list(context or [])
        messages.append({"role": "user", "content": message})

        kwargs = {
            "model":      self.model,
            "max_tokens": max_tokens,
            "system":     self.system_prompt,
            "messages":   messages,
        }
        if self.tools:
            kwargs["tools"] = self.tools

        response = client.messages.create(**kwargs)

        if response.stop_reason == "tool_use":
            return self._handle_tool_use(response, messages, max_tokens)

        return response.content[0].text if response.content else ""

    def _handle_tool_use(self, response, messages, max_tokens):
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []

        for block in response.content:
            if block.type == "tool_use":
                result = self._execute_tool(block.name, block.input)
                tool_results.append({
                    "type":        "tool_result",
                    "tool_use_id": block.id,
                    "content":     json.dumps(result)
                })

        messages.append({"role": "user", "content": tool_results})

        final = client.messages.create(
            model=self.model, max_tokens=max_tokens,
            system=self.system_prompt, messages=messages,
            tools=self.tools
        )
        return final.content[0].text if final.content else ""

    def _execute_tool(self, tool_name: str, tool_input: Dict) -> Any:
        return {"error": f"Tool {tool_name} not implemented in {self.name}"}

    def __repr__(self):
        return f"Agent({self.name}, role={self.role})"

print("BaseAgent class built.")

Pattern 1: Orchestrator-Worker

class OrchestratorAgent(BaseAgent):
    """Breaks down complex goals and delegates to specialized workers."""

    def __init__(self, workers: List[BaseAgent]):
        super().__init__(
            name   = "Orchestrator",
            role   = "coordinator",
            system_prompt = f"""You are an orchestrator that delegates tasks to specialized agents.

Available workers:
{self._format_workers(workers)}

To delegate a task, respond with JSON:
{{
  "delegations": [
    {{"agent": "agent_name", "task": "specific task description", "priority": 1}},
    ...
  ],
  "execution": "sequential" or "parallel"
}}

After receiving worker results, synthesize them into a final coherent answer."""
        )
        self.workers = {w.name: w for w in workers}

    def _format_workers(self, workers):
        return "\n".join(f"- {w.name} ({w.role}): handles {w.role} tasks"
                         for w in workers)

    def run(self, goal: str, verbose: bool = True) -> str:
        if verbose:
            print(f"\n{'='*60}")
            print(f"Orchestrator Goal: {goal}")
            print(f"{'='*60}")

        plan_prompt = f"""Goal: {goal}

Create a delegation plan. Which agents should handle which parts?
Respond with the JSON delegation format."""

        plan_json = self.think(plan_prompt)

        try:
            plan = json.loads(plan_json)
        except json.JSONDecodeError:
            import re
            match = re.search(r'\{.*\}', plan_json, re.DOTALL)
            if match:
                plan = json.loads(match.group())
            else:
                plan = {"delegations": [{"agent": list(self.workers.keys())[0],
                                          "task": goal, "priority": 1}],
                         "execution": "sequential"}

        if verbose:
            print(f"\nPlan: {plan.get('execution', 'sequential')} execution")
            for d in plan.get("delegations", []):
                print(f"  → {d['agent']}: {d['task'][:60]}")

        worker_results = {}
        for delegation in plan.get("delegations", []):
            agent_name = delegation["agent"]
            task       = delegation["task"]

            if agent_name in self.workers:
                if verbose:
                    print(f"\n[{agent_name}] working on: {task[:50]}...")
                result = self.workers[agent_name].think(task)
                worker_results[agent_name] = result
                if verbose:
                    print(f"[{agent_name}] done: {result[:100]}...")

        synthesis_prompt = f"""Original goal: {goal}

Worker results:
{json.dumps(worker_results, indent=2)}

Synthesize these results into a single, coherent, well-structured answer."""

        final_answer = self.think(synthesis_prompt)
        return final_answer

research_agent = BaseAgent(
    name          = "Researcher",
    role          = "research",
    system_prompt = """You are a research specialist. Your job is to find and synthesize information.
Always cite sources, be thorough, and organize findings clearly.
Present information as bullet points with key facts highlighted."""
)

writer_agent = BaseAgent(
    name          = "Writer",
    role          = "writing",
    system_prompt = """You are a technical writer. Your job is to turn research into clear, engaging prose.
Write in an accessible but precise style.
Structure content with clear headings and logical flow.
Target audience: developers and data scientists."""
)

critic_agent = BaseAgent(
    name          = "Critic",
    role          = "review",
    system_prompt = """You are a critical reviewer. Your job is to find flaws and gaps.
Be constructive but rigorous. Identify:
- Factual errors or unsupported claims
- Missing important information
- Unclear or confusing passages
- Structural improvements needed
Score quality 1-10 and explain your rating."""
)

orchestrator = OrchestratorAgent(
    workers = [research_agent, writer_agent, critic_agent]
)

print("\nOrchestrator-Worker system ready.")
print(f"Workers: {list(orchestrator.workers.keys())}")

result = orchestrator.run(
    "Explain the key differences between BERT and GPT, including their architectures, "
    "training objectives, and best use cases.",
    verbose=True
)
print(f"\nFinal Answer:\n{result[:500]}...")

Pattern 2: Sequential Pipeline

class Pipeline:
    """Agents run in sequence, output flows to next agent as input."""

    def __init__(self, agents: List[BaseAgent], verbose: bool = True):
        self.agents  = agents
        self.verbose = verbose
        self.outputs = {}

    def run(self, initial_input: str) -> str:
        current = initial_input

        for i, agent in enumerate(self.agents):
            if self.verbose:
                print(f"\n[Stage {i+1}/{len(self.agents)}] {agent.name}")
                print(f"  Input:  {current[:80]}...")

            prompt = (
                f"Previous stage output:\n{current}\n\nYour task: {agent.role}"
                if i > 0 else current
            )
            current = agent.think(prompt)
            self.outputs[agent.name] = current

            if self.verbose:
                print(f"  Output: {current[:80]}...")

        return current

draft_agent = BaseAgent(
    name          = "Drafter",
    role          = "Write a first draft. Do not worry about perfection, focus on getting ideas down.",
    system_prompt = "You are a first-draft writer. Write quickly and completely. Cover all the key points."
)

editor_agent = BaseAgent(
    name          = "Editor",
    role          = "Edit the draft for clarity, concision, and flow. Fix any awkward sentences.",
    system_prompt = "You are a skilled editor. Improve clarity and remove redundancy while preserving meaning."
)

formatter_agent = BaseAgent(
    name          = "Formatter",
    role          = "Format the edited content with proper markdown, headers, and structure.",
    system_prompt = "You are a content formatter. Add appropriate markdown formatting, headers, and bullet points."
)

pipeline = Pipeline(
    agents  = [draft_agent, editor_agent, formatter_agent],
    verbose = True
)

print("\nSequential Pipeline: Draft → Edit → Format")
final = pipeline.run(
    "Write a brief explanation of how neural networks learn through backpropagation.")
print(f"\nFinal formatted output:\n{final[:400]}...")

Pattern 3: Parallel Execution

import concurrent.futures
import threading

class ParallelAgentRunner:
    """Run multiple agents simultaneously on independent subtasks."""

    def __init__(self, agents_and_tasks: List[tuple],
                 max_workers: int = 4, verbose: bool = True):
        self.agents_and_tasks = agents_and_tasks
        self.max_workers      = max_workers
        self.verbose          = verbose
        self._lock            = threading.Lock()

    def run(self) -> Dict[str, str]:
        results = {}
        start   = time.time()

        def run_agent(agent_task_pair):
            agent, task = agent_task_pair
            if self.verbose:
                with self._lock:
                    print(f"  → [{agent.name}] started: {task[:50]}...")
            result = agent.think(task)
            if self.verbose:
                with self._lock:
                    print(f"  ✓ [{agent.name}] done ({time.time()-start:.1f}s)")
            return agent.name, result

        with concurrent.futures.ThreadPoolExecutor(
            max_workers=self.max_workers
        ) as executor:
            futures = {executor.submit(run_agent, pair): pair
                       for pair in self.agents_and_tasks}
            for future in concurrent.futures.as_completed(futures):
                name, result = future.result()
                results[name] = result

        elapsed = time.time() - start
        if self.verbose:
            print(f"\nAll agents completed in {elapsed:.1f}s total")
        return results

asia_agent  = BaseAgent("Asia_Researcher",   "researcher",
    "You research the Asian tech market. Focus on China, Japan, South Korea, India.")
europe_agent = BaseAgent("Europe_Researcher", "researcher",
    "You research the European tech market. Focus on UK, Germany, France, Nordics.")
us_agent    = BaseAgent("US_Researcher",     "researcher",
    "You research the US tech market. Focus on Silicon Valley, NYC, emerging hubs.")

topic = "the adoption and trends in AI/ML technology in 2024"

parallel_runner = ParallelAgentRunner(
    agents_and_tasks = [
        (asia_agent,   f"Research {topic} in Asia"),
        (europe_agent, f"Research {topic} in Europe"),
        (us_agent,     f"Research {topic} in the United States"),
    ],
    verbose = True
)

print("\nParallel Execution: 3 regional researchers running simultaneously")
parallel_results = parallel_runner.run()

synthesizer = BaseAgent(
    name          = "Synthesizer",
    role          = "synthesis",
    system_prompt = "You synthesize multiple research reports into one coherent global overview."
)

global_report = synthesizer.think(
    f"Synthesize these regional research reports into a global overview:\n\n" +
    "\n\n".join(f"=== {name} ===\n{result}"
                for name, result in parallel_results.items())
)
print(f"\nGlobal synthesis:\n{global_report[:400]}...")

Pattern 4: Debate Agent

class DebateSystem:
    """Two agents argue opposing sides, a judge evaluates."""

    def __init__(self, model: str = "claude-3-5-haiku-20241022"):
        self.proposer = BaseAgent(
            name          = "Proposer",
            role          = "advocate",
            system_prompt = """You are an advocate for the proposition.
Make the strongest possible case FOR the position you are assigned.
Use evidence, logic, and compelling arguments. Be persuasive.""",
            model=model
        )
        self.opponent = BaseAgent(
            name          = "Opponent",
            role          = "critic",
            system_prompt = """You are a critic of the proposition.
Make the strongest possible case AGAINST the position presented.
Find flaws, gaps, counterexamples, and alternative views. Be rigorous.""",
            model=model
        )
        self.judge = BaseAgent(
            name          = "Judge",
            role          = "arbitrator",
            system_prompt = """You are an impartial judge evaluating a debate.
Assess both sides fairly. Identify the strongest arguments from each side.
Make a reasoned final verdict with clear justification.
Format: [FOR arguments] [AGAINST arguments] [Verdict] [Reasoning]""",
            model=model
        )

    def debate(self, proposition: str, rounds: int = 2,
                verbose: bool = True) -> Dict:
        if verbose:
            print(f"\nDebate: '{proposition}'")
            print("=" * 60)

        context_p = []
        context_o = []

        for round_num in range(1, rounds + 1):
            if verbose:
                print(f"\n--- Round {round_num} ---")

            prop_arg = self.proposer.think(
                f"Round {round_num}: Argue FOR: '{proposition}'",
                context=context_p
            )
            context_p.append({"role": "assistant", "content": prop_arg})
            if verbose:
                print(f"FOR:     {prop_arg[:150]}...")

            opp_arg = self.opponent.think(
                f"Round {round_num}: Counter this argument against '{proposition}':\n{prop_arg}",
                context=context_o
            )
            context_o.append({"role": "assistant", "content": opp_arg})
            if verbose:
                print(f"AGAINST: {opp_arg[:150]}...")

            context_p.append({"role": "user",
                               "content": f"Opponent says: {opp_arg}"})
            context_o.append({"role": "user",
                               "content": f"Proposer says: {prop_arg}"})

        all_args = "\n\n".join([
            f"FOR:\n{context_p[i]['content']}"
            for i in range(0, len(context_p), 2)
        ] + [
            f"AGAINST:\n{context_o[i]['content']}"
            for i in range(0, len(context_o), 2)
        ])

        verdict = self.judge.think(
            f"Proposition: '{proposition}'\n\nDebate arguments:\n{all_args}\n\nDeliver your verdict.")

        if verbose:
            print(f"\nJudge's Verdict:\n{verdict[:300]}...")

        return {
            "proposition":  proposition,
            "for_arguments": [context_p[i]["content"] for i in range(0, len(context_p), 2)],
            "against_arguments": [context_o[i]["content"] for i in range(0, len(context_o), 2)],
            "verdict":      verdict
        }

debate = DebateSystem()
result = debate.debate(
    proposition = "Large Language Models will replace most software engineering jobs within 10 years",
    rounds      = 1,
    verbose     = True
)

Pattern 5: Critic-Review Loop

class CriticReviewLoop:
    """Creator produces, critic evaluates, loop until quality threshold met."""

    def __init__(self, creator: BaseAgent, critic: BaseAgent,
                 max_iterations: int = 3, quality_threshold: float = 8.0):
        self.creator           = creator
        self.critic            = critic
        self.max_iterations    = max_iterations
        self.quality_threshold = quality_threshold

    def run(self, task: str, verbose: bool = True) -> Dict:
        history  = []
        feedback = ""

        for iteration in range(1, self.max_iterations + 1):
            if verbose:
                print(f"\n--- Iteration {iteration} ---")

            creation_prompt = (
                f"{task}\n\nFeedback from previous attempt:\n{feedback}\nImprove accordingly."
                if feedback else task
            )
            content = self.creator.think(creation_prompt)
            history.append({"iteration": iteration, "content": content})

            if verbose:
                print(f"[{self.creator.name}]: {content[:120]}...")

            critique = self.critic.think(
                f"Evaluate this content (score 1-10 and feedback):\n\n{content}"
            )
            if verbose:
                print(f"[{self.critic.name}]: {critique[:120]}...")

            import re
            score_match = re.search(r'\b([0-9]|10)\b', critique)
            score = float(score_match.group()) if score_match else 7.0

            if score >= self.quality_threshold:
                if verbose:
                    print(f"\n✓ Quality threshold reached (score={score})")
                break

            feedback = critique

        return {
            "final_content": content,
            "iterations":    iteration,
            "history":       history
        }

code_writer = BaseAgent(
    name="CodeWriter", role="code_creator",
    system_prompt="You write clean, well-documented Python code. Include docstrings and type hints.")

code_reviewer = BaseAgent(
    name="CodeReviewer", role="code_critic",
    system_prompt="""You review Python code rigorously. Check for:
- Correctness and edge cases
- Code clarity and documentation  
- PEP 8 compliance
- Error handling
Score 1-10 and give specific actionable feedback.""")

review_loop = CriticReviewLoop(
    creator           = code_writer,
    critic            = code_reviewer,
    max_iterations    = 3,
    quality_threshold = 8.0
)

print("\nCritic-Review Loop: write and improve code iteratively")
result = review_loop.run(
    "Write a Python function that finds the longest palindrome substring in a string.")
print(f"\nFinal code after {result['iterations']} iteration(s):")
print(result["final_content"][:400])

When Multi-Agent Adds Real Value

print("\nWhen to Use Multi-Agent Systems:")
print()

use_cases = {
    "Use multi-agent when": [
        "Tasks naturally decompose into specialized subtasks",
        "Quality requires multiple independent perspectives",
        "Parallel execution would save significant time",
        "Different parts of the task need different 'personalities' or constraints",
        "One agent's output quality is not good enough and critique helps",
        "Tasks exceed a single context window",
    ],
    "Stick with single agent when": [
        "Task is straightforward and fits one context window",
        "Coordination overhead would outweigh the benefits",
        "You need predictable, debuggable behavior",
        "Latency is critical (multi-agent adds round trips)",
        "Budget is tight (each agent call costs tokens)",
        "You are still prototyping (complexity kills iteration speed)",
    ],
}

for category, points in use_cases.items():
    print(f"  {category}:")
    for point in points:
        print(f"    {'✓' if 'Use' in category else '✗'} {point}")
    print()

Reference Links

print("Essential Multi-Agent Reference Links:")
print()

refs = {
    "Papers": [
        ("Society of Mind (Minsky, 1986)",      "en.wikipedia.org/wiki/Society_of_Mind"),
        ("LLM-based Multi-Agent Survey",         "arxiv.org/abs/2402.01680"),
        ("AutoGen: Multi-agent conversations",   "arxiv.org/abs/2308.08155"),
        ("MetaGPT: Meta programming agents",     "arxiv.org/abs/2308.00352"),
        ("ChatDev: Software development agents", "arxiv.org/abs/2307.07924"),
    ],
    "Frameworks": [
        ("AutoGen (Microsoft)",          "github.com/microsoft/autogen"),
        ("CrewAI",                       "crewai.com"),
        ("LangGraph (stateful graphs)",  "langchain-ai.github.io/langgraph"),
        ("Semantic Kernel (Microsoft)",  "learn.microsoft.com/semantic-kernel"),
        ("Agency Swarm",                 "github.com/VRSEN/agency-swarm"),
        ("Camel-AI",                     "github.com/camel-ai/camel"),
    ],
    "Tutorials": [
        ("Anthropic multi-agent cookbook",       "github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents"),
        ("DeepLearning.AI Multi-agent course",   "learn.deeplearning.ai/multi-ai-agent-systems"),
        ("LangGraph multi-agent tutorial",       "langchain-ai.github.io/langgraph/tutorials"),
        ("AutoGen docs and examples",            "microsoft.github.io/autogen"),
    ],
    "Blog Posts": [
        ("Lilian Weng: LLM Powered Autonomous Agents", "lilianweng.github.io/posts/2023-06-23-agent"),
        ("Andrej Karpathy: Software 2.0",              "karpathy.medium.com/software-2-0-a64152b37c35"),
        ("Anthropic: Building effective agents",        "anthropic.com/research/building-effective-agents"),
    ],
}

for category, links in refs.items():
    print(f"  {category}:")
    for name, url in links:
        print(f"    • {name:<48} {url}")
    print()

Try This

Create multi_agent_practice.py.

Part 1: implement the orchestrator-worker pattern from scratch. Create three specialized agents: a researcher (mock web search), a summarizer, and a formatter. Give the orchestrator a goal like "Research and summarize the key concepts of reinforcement learning." Verify it delegates appropriately.

Part 2: build a sequential pipeline with four stages. Stage 1: brainstorm 10 ideas for a blog post on a technical topic. Stage 2: select the best three and outline each. Stage 3: write one paragraph for each. Stage 4: format into a complete post with headings.

Part 3: implement the critic-review loop. Write a code generation task (sort algorithm, data structure, utility function). Run 3 iterations of write-critique-improve. Does the code quality measurably improve across iterations?

Part 4: debate two real technical positions. Example: "Python is better than JavaScript for backend development." Run two rounds. Print both sides' arguments and the judge's verdict. Does the debate surface arguments you had not considered?

What's Next

Agents need memory to be truly useful across sessions. The next post covers agent memory systems: how to store past actions, how to recall relevant past experience, and how to build agents that improve over time rather than starting fresh every conversation.

Top comments (2)

Harjot Singh • May 31

The "when one agent is not enough" framing is important because multi-agent isn't automatically better - it's better for a specific reason: scoped context. One agent juggling a big task drags an ever-growing context and loses the thread; splitting into specialists means each operates on a small, relevant slice, which improves both quality and cost. The win isn't "more agents," it's "smaller, focused contexts per agent."

The flip side worth flagging for readers: multi-agent buys you focus but bills you coordination - handoff contracts, who arbitrates conflicts, failure isolation so one agent's bad output doesn't poison the rest. Single-agent is simpler until the task outgrows one context; multi-agent is more powerful but the orchestration is the new hard part. That trade is the whole design space of Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - many scoped agents, and the real engineering is the orchestration + gated handoffs between them (which also keeps a build ~$3 flat). Solid post in what looks like a great series. Where do you draw the line - at what point does a task justify the coordination overhead of going multi-agent vs just giving one agent better context?

Akhilesh • May 31

Excellent point. I agree that the real benefit is focused context, not simply adding more agents. My rule of thumb is to stay single-agent until context size, task complexity, or tool coordination starts hurting reliability. Once a workflow naturally splits into distinct stages, the coordination overhead of multi-agent systems becomes worth it. I also agree that orchestration and handoffs are where most of the engineering challenge actually lives.