Diven Rastdus

Posted on Mar 25

When to Use Multi-Agent Systems (And When Not To)

#ai #agents #typescript #architecture

Multi-agent systems are having a moment. Every conference talk shows a diagram with six agents pointing at each other. Every framework ships with coordination primitives. The implicit message is that more agents equals more power.

It does not. More agents equals more complexity, more failure modes, and more coordination overhead. The teams that build successful multi-agent systems are the ones who resisted adding agents until they had a specific, concrete reason to do so.

This post covers the legitimate reasons to go multi-agent, three patterns that actually work in production, the pitfalls that catch most teams the first time, and a decision rule you can apply to your own use case.

The Four Legitimate Reasons

There are exactly four reasons to use multiple agents. If none of these apply, you do not need the complexity.

Context window limits. Even with 1M-token context windows on Claude Opus 4.6, complex long-running tasks hit limits. A coding agent reasoning about a 500-file codebase while running tests, researching documentation, and tracking long task history will run out of context. The solution is giving each agent a bounded, focused slice of the problem.

Specialization. Different parts of a problem benefit from different expertise and prompting. A customer support system might need one agent for technical diagnosis (error logs and documentation), another for billing (transaction records), and another for account management (user data). Each agent can have a tightly focused system prompt, the right tools, and even a different model tuned for its task.

Parallelism. Some tasks decompose into independent subtasks that can run simultaneously. If you need to research five topics for a report, why run them sequentially? Five agents in parallel finish in the time one takes to do one topic.

Isolation. One agent's failure or hallucination should not crash the whole system. When agents run with separate context windows, a problem in one stays contained. The same argument you make for microservices.

If you cannot map your use case to one of these four reasons, you do not need multiple agents yet.

Pattern 1: Orchestrator and Workers

The most battle-tested pattern. One orchestrator plans and delegates; multiple worker agents execute specific tasks and report back.

The orchestrator understands the high-level goal, breaks it into subtasks, assigns each to the appropriate worker, and synthesizes results. Workers are specialists: narrow focus, the right tools for their domain, no awareness of the larger goal.

Here is this pattern applied to a code review system:

// lib/agents/orchestrator.ts
const delegateToWorker = tool({
  description: "Delegate a specific task to a specialized worker agent.",
  inputSchema: z.object({
    worker: z.enum(["security", "performance", "style", "tests"]),
    task: z.string().describe("Detailed description of what the worker should do."),
    context: z.string().describe("Relevant code the worker needs."),
  }),
  execute: async ({ worker, task, context }) => {
    const { runWorker } = await import(`./workers/${worker}`);
    return { worker, result: await runWorker(task, context) };
  },
});

export const orchestratorAgent = new ToolLoopAgent({
  model: "anthropic/claude-opus-4.6",  // expensive model for planning
  instructions: `You are a senior code reviewer. Coordinate a team of specialists.

When given code to review:
1. Analyze what types of review are needed (security, performance, style, tests).
2. Delegate each type to the appropriate specialist.
3. Compile all findings into a final report.

Be specific in task descriptions -- workers only know what you tell them.`,
  tools: { delegate_to_worker: delegateToWorker, compile_final_report: compileFinalReport },
  stopWhen: stepCountIs(15),
});

// lib/agents/workers/security.ts
export async function runWorker(task: string, context: string) {
  const agent = new ToolLoopAgent({
    model: "anthropic/claude-sonnet-4.6",  // cheaper model for execution
    instructions: `You are a security-focused code reviewer. Look for:
- SQL injection, XSS, CSRF vulnerabilities
- Hardcoded secrets or credentials
- Missing authentication or authorization checks
- Unsafe use of eval() or dynamic code execution

Be specific. Report the exact line that is problematic and explain why.`,
    tools: {},
    stopWhen: stepCountIs(3),
  });

  const result = await agent.generate({
    prompt: `Task: ${task}\n\nCode to review:\n${context}`,
  });
  return result.text;
}

Notice the model routing: the orchestrator uses claude-opus-4.6 for planning while workers use claude-sonnet-4.6 for execution. The expensive model makes decisions about what to do; the cheaper model does the execution. On a complex review, this cuts costs by 60-70% with minimal quality loss.

The orchestrator's task descriptions to workers must be detailed. Workers have no awareness of the broader context -- they only know what the orchestrator tells them. "Review this for security issues" is a bad task description. "Look for SQL injection vulnerabilities in the user authentication flow, specifically in the login and password reset handlers" is a good one.

Pattern 2: Agent Teams

Where Orchestrator+Workers has a clear hierarchy, Agent Teams are peers. Each agent has equal authority, they share a common task list, and they self-organize rather than being explicitly assigned work.

This pattern excels when you have a pool of parallel independent tasks:

// lib/teams/task-queue.ts

// Atomic claim prevents two agents from picking up the same task
export async function claimNextTask(agentId: string): Promise<Task | null> {
  return await db.transaction(async (tx) => {
    const pending = await tx.query.tasks.findFirst({
      where: eq(tasks.status, "pending"),
    });
    if (!pending) return null;

    await tx.update(tasks)
      .set({ status: "claimed", claimedBy: agentId })
      .where(
        and(
          eq(tasks.id, pending.id),
          eq(tasks.status, "pending")  // double-check in same transaction
        )
      );

    return { ...pending, status: "claimed" as TaskStatus, claimedBy: agentId };
  });
}

// A single team member -- run multiple instances in parallel
export async function runTeamAgent(agentId: string) {
  while (true) {
    const task = await claimNextTask(agentId);
    if (!task) break;  // no tasks left

    try {
      const agent = new ToolLoopAgent({
        model: "anthropic/claude-sonnet-4.6",
        instructions: `You are research agent ${agentId}. Complete the assigned task.`,
        tools: { web_search: webSearchTool, read_page: readPageTool },
        stopWhen: stepCountIs(10),
      });

      const result = await agent.generate({
        prompt: `Task: ${task.type}\nData: ${JSON.stringify(task.payload)}`,
      });

      await completeTask(task.id, result.text);
    } catch (error) {
      await failTask(task.id, (error as Error).message);
    }
  }
}

// Launch a team of N agents to work through the queue in parallel
export async function launchTeam(teamSize: number = 5) {
  await Promise.all(
    Array.from({ length: teamSize }, (_, i) => runTeamAgent(`agent-${i}`))
  );
}

The critical detail is the atomic claim using a database transaction with a double-check on status. Without it, two agents can both read the same "pending" task and both start working on it. This is not a theoretical concern -- it happens in production as soon as you have more than one agent running.

Pattern 3: Hierarchical Delegation with Fallback

Hierarchical delegation extends Orchestrator+Workers with a feedback loop: the senior agent reviews junior output, corrects mistakes, and has a hard rule for when to stop delegating and take over directly.

const MAX_JUNIOR_ATTEMPTS = 3;

async function delegateWithFallback(
  seniorAgent: ToolLoopAgent,
  juniorAgent: ToolLoopAgent,
  task: string,
  qualityCheck: (output: string) => Promise<boolean>
): Promise<{ output: string; attempts: number; takenOver: boolean }> {
  let attempts = 0;
  let lastOutput = "";

  while (attempts < MAX_JUNIOR_ATTEMPTS) {
    attempts++;

    const prompt = attempts === 1
      ? task
      : `Previous attempt was rejected. Improve on this:\n${lastOutput}\n\nOriginal task: ${task}`;

    const result = await juniorAgent.generate({ prompt });
    lastOutput = result.text;

    if (await qualityCheck(lastOutput)) {
      return { output: lastOutput, attempts, takenOver: false };
    }
  }

  // After 3 failures, senior takes over
  const result = await seniorAgent.generate({ prompt: task });
  return { output: result.text, attempts, takenOver: true };
}

The quality check itself is a model call -- a cheap, fast model scores the junior's output. This is sometimes called "LLM as judge" and is widely used in agent eval systems. Make the scoring rubric explicit and strict. A vague rubric produces inconsistent scores.

Communication Patterns

Agents need to exchange information. Three options, each with different tradeoffs:

Shared files. Write a file; another agent reads it. Good for structured outputs (JSON reports, task lists). Easily debuggable. Use atomic writes (write to temp file, then rename) to avoid partial reads.

Message passing. Structured JSON messages through a queue or database table. More explicit, supports async patterns. The database-backed task queue in Pattern 2 is an example.

Shared database. Multiple agents read and write the same structured tables. Richest form of communication. Requires careful attention to transactions and concurrent writes. Right for agents building a shared artifact.

The most important principle across all three: communication between agents is lossy. When an orchestrator writes a task, some original context is lost in translation. When a worker writes a summary back, detail is lost in compression. Design interfaces to include the minimum necessary information rather than trying to preserve everything.

The Context Isolation Principle

This is the most important architectural principle in multi-agent systems: each agent gets its own context window, and agents should not share context.

When you give multiple agents access to the same large context, you have not solved the context problem -- you have just distributed the same overloaded context to multiple agents. The benefit of multi-agent architecture is that each agent focuses on a bounded slice of the problem.

Think of agents like microservices: clear interfaces, isolated state, explicit message formats.

// Good: extract only what the worker needs
const securityTask = {
  files: relevantAuthFiles,  // not the entire codebase
  context: "Focus on authentication and authorization flows",
};

// Bad: dump everything
const badSecurityTask = {
  conversationHistory: fullHistory,  // worker doesn't need most of this
  allFiles: entireCodebase,
};

The Four Pitfalls

Over-delegation. The most common mistake. Teams reach for multi-agent architecture before exhausting what a single well-designed agent can do. A single agent with good tools and a clear system prompt handles most production use cases. Add agents when you hit an actual limit, not in anticipation of one.

Under-specification. Workers need detailed task descriptions. Every task handed to a worker should include: what to do, what context to use, what format to respond in, and what "done" looks like. Treat it like writing a ticket for a junior engineer on their first day.

Coordination overhead. Every agent hop adds latency and cost. A task passing through an orchestrator, delegated to three workers, and synthesized might take 5-10 API calls where a single agent would take 1-2. Profile your pipelines. If coordination cost exceeds the benefit, simplify.

The telephone game. Information degrades with every agent hop. A fact that starts with the user becomes a summary in the orchestrator's context, a fragment in the worker's task, and a further-compressed mention in the worker's output. Mitigate by keeping task descriptions precise and having workers return structured JSON rather than free-form text.

The Decision Rule

Start with one agent. Add more agents when:

You are hitting context window limits on real tasks (not hypothetical future tasks)
You have distinct subtasks that genuinely benefit from different prompts, tools, or models
You have independent subtasks that could run in parallel and the speedup is worth the coordination cost
One agent's failures are cascading and you need isolation

If none of these are true, you do not need multi-agent architecture yet. The added complexity is real and the benefits are only worth it in specific scenarios. Every agent you add is another thing that can fail, another interface that can degrade information, another moving part to monitor.

The best multi-agent systems are built by teams who started with one agent and added the second one because they had no other choice.

This post is adapted from Production AI Agents: Build, Deploy, and Monetize Autonomous Systems, available on Amazon Kindle. The book goes deeper with 12 chapters of real code, battle-tested patterns, and a complete hands-on tutorial.

I build production AI systems. More at astraedus.dev.

DEV Community