Webby Wisp

Posted on Mar 20

Orchestrating Sub-Agents: Patterns for Parallel AI Work

#ai #agents #programming #backend

When your AI agent needs to handle multiple tasks at once, you have a choice: do everything sequentially in one process, or spawn parallel workers. Most developers pick the first path by default. It's simpler. It's wrong.

Here's why: if your agent needs to research three topics, write a summary, and send notifications—doing that linearly means waiting 30 seconds for research, then 10 seconds for writing, then 5 seconds for notifications. Total: 45 seconds of latency, with the system idle most of the time.

Parallel sub-agents compress that to the slowest task (research: 30 seconds). You trade simplicity for responsiveness—and in production, that trade is always worth it.

This is how we do it at scale.

What's a Sub-Agent?

A sub-agent is a lightweight, isolated AI process spawned by your main agent to handle a specific task in parallel. Think of it as a worker thread, but for LLM tasks.

You:

Spawn a sub-agent with a focused prompt
Main process continues (doesn't block)
Sub-agent completes its task asynchronously
Results flow back to the main agent

The key: isolation. Each sub-agent has its own LLM call, its own token budget, its own context. This means failures are contained, costs are predictable, and you can fail fast.

Pattern 1: Research Fanout

You need intel on three competitors. Instead of researching them one at a time:

Main Agent
├─ Spawn: "Research competitor A"
├─ Spawn: "Research competitor B"
└─ Spawn: "Research competitor C"

[All three happen in parallel]

Main Agent
├─ Receives: A's summary
├─ Receives: B's summary
└─ Receives: C's summary
→ Synthesize into one comparison doc

Why it matters: If each research takes 10 seconds, serial takes 30 seconds. Parallel takes 10. That's 3x faster, and you hit the token budget cap once, not three times.

When to use: Market research, competitive analysis, data gathering, knowledge aggregation.

Cost: 3 LLM calls instead of 1, but they're cheap (use Haiku for sub-agents). Total token spend: same or less because you're not blocking on slow tasks.

Pattern 2: Staged Validation

You're building a document that needs multiple passes:

Draft the content
Check for factual errors
Audit for brand voice
Review for security issues

Instead of sequential review:

Main Agent
├─ Spawn: "Draft content"
│  └─ Receives: draft_v1
├─ Spawn: "Check facts" (on draft_v1, parallel to drafting!)
├─ Spawn: "Audit voice" (on draft_v1, parallel)
└─ Spawn: "Security review" (on draft_v1, parallel)

Main Agent
├─ Receives: feedback_facts
├─ Receives: feedback_voice
└─ Receives: feedback_security
→ Merge feedback, iterate once (or declare pass)

Why it matters: Instead of draft → check → audit → security (4 sequential steps), you draft once, then validate in parallel. 2x-3x faster.

When to use: Multi-stage workflows, quality gates, cross-functional checks.

Pro tip: Pass the draft to all validators in parallel. They'll naturally find different things. Merge feedback once.

Pattern 3: Recovery & Fallback

Your main task fails. Instead of giving up, spawn recovery sub-agents:

Main Agent
├─ Try: "Generate code for X"
│  └─ FAILS (model context too small)
├─ Spawn: "Simplify the requirements" (sub-agent A)
├─ Spawn: "Research alternative approaches" (sub-agent B)
└─ Spawn: "Check if this is the wrong problem" (sub-agent C)

Main Agent
├─ Receives: simplified requirements
├─ Receives: 3 alternatives
├─ Receives: problem analysis
→ Pick best path forward or escalate to human

Why it matters: You don't block. You gather intelligence. The main process can make a smarter decision with fresh perspectives.

When to use: Error recovery, dead-end detection, decision escalation.

Pattern 4: Batch Processing at Scale

You have 100 items to process (emails, support tickets, pull requests). Serial processing takes forever. Parallel batching:

Main Agent
├─ Spawn 10 sub-agents (batch size: 10 items each)
│  ├─ Sub-agent 1: Process items 1-10
│  ├─ Sub-agent 2: Process items 11-20
│  ├─ ...
│  └─ Sub-agent 10: Process items 91-100
├─ [Wait for all to complete]
└─ Aggregate results

Why it matters: 100 items at 1s each = 100s serial. Same 100 items × 10 workers = 10s parallel. You're bandwidth-limited by your LLM API, not your agent code.

When to use: Bulk classification, batch summaries, large-scale data transforms.

Constraint: Respect API rate limits. Anthropic/OpenAI throttle concurrent requests. Start with 3-5 workers, scale up if your quota allows.

How to Actually Do This

The framework matters less than the pattern. Here's how it looks in OpenClaw:

# Pseudo-code (language-agnostic)

def orchestrate():
    # Spawn 3 parallel sub-agents
    results = []
    for task in ["research_A", "research_B", "research_C"]:
        sub = spawn_subagent(
            task=task,
            model="haiku",  # Cheap model for sub-agents
            runtime="isolated"  # Don't block main process
        )
        results.append(sub)

    # Main process continues (can do other work here)
    # Later, gather results
    findings = [r.await() for r in results]

    # Synthesize
    summary = synthesize(findings)
    return summary

Key choices:

Cheap model for sub-agents (Haiku, Codex-mini). Reserve your expensive models (Sonnet, Opus) for the main agent.
Isolated runtime. Each sub-agent gets its own sandbox, token budget, and error isolation.
Don't block unless you have to. Spawn, continue, gather later.

When NOT to Use Sub-Agents

Sub-agents add latency and cost if misused:

For tiny tasks (< 1 second LLM time). Overhead isn't worth it.
When you need immediate context. If sub-agent results feed directly into the next step with no processing, the latency doesn't help.
For sequential dependencies. If task B needs output from task A, you can't parallelize them. Don't force it.

The Real Win: Scalability

Sub-agents let you scale your AI agent's capabilities without hitting API limits or blocking on slow operations. You're trading single-process simplicity for multi-process resilience.

And here's the secret: it's not just about speed. It's about uncertainty handling. When you parallelize work, you naturally build in redundancy. If one sub-agent fails, the others complete. You can retry smartly. You can make better decisions with conflicting opinions.

This is how production AI systems think at scale.

Build This Yourself

Want a scaffold for this? Start with @webbywisp/create-ai-agent — it includes workspace templates with heartbeat automation and sub-agent patterns baked in. No boilerplate, just spawn and go.

Or dive deeper with the AI Agent Workspace Kit — opinionated templates for memory, orchestration, and production patterns.

Either way, stop building agents in a single process. Your future self will thank you.

DEV Community