When your AI agent needs to handle multiple tasks at once, you have a choice: do everything sequentially in one process, or spawn parallel workers. Most developers pick the first path by default. It's simpler. It's wrong.
Here's why: if your agent needs to research three topics, write a summary, and send notifications—doing that linearly means waiting 30 seconds for research, then 10 seconds for writing, then 5 seconds for notifications. Total: 45 seconds of latency, with the system idle most of the time.
Parallel sub-agents compress that to the slowest task (research: 30 seconds). You trade simplicity for responsiveness—and in production, that trade is always worth it.
This is how we do it at scale.
What's a Sub-Agent?
A sub-agent is a lightweight, isolated AI process spawned by your main agent to handle a specific task in parallel. Think of it as a worker thread, but for LLM tasks.
You:
- Spawn a sub-agent with a focused prompt
- Main process continues (doesn't block)
- Sub-agent completes its task asynchronously
- Results flow back to the main agent
The key: isolation. Each sub-agent has its own LLM call, its own token budget, its own context. This means failures are contained, costs are predictable, and you can fail fast.
Pattern 1: Research Fanout
You need intel on three competitors. Instead of researching them one at a time:
Main Agent
├─ Spawn: "Research competitor A"
├─ Spawn: "Research competitor B"
└─ Spawn: "Research competitor C"
[All three happen in parallel]
Main Agent
├─ Receives: A's summary
├─ Receives: B's summary
└─ Receives: C's summary
→ Synthesize into one comparison doc
Why it matters: If each research takes 10 seconds, serial takes 30 seconds. Parallel takes 10. That's 3x faster, and you hit the token budget cap once, not three times.
When to use: Market research, competitive analysis, data gathering, knowledge aggregation.
Cost: 3 LLM calls instead of 1, but they're cheap (use Haiku for sub-agents). Total token spend: same or less because you're not blocking on slow tasks.
Pattern 2: Staged Validation
You're building a document that needs multiple passes:
- Draft the content
- Check for factual errors
- Audit for brand voice
- Review for security issues
Instead of sequential review:
Main Agent
├─ Spawn: "Draft content"
│ └─ Receives: draft_v1
├─ Spawn: "Check facts" (on draft_v1, parallel to drafting!)
├─ Spawn: "Audit voice" (on draft_v1, parallel)
└─ Spawn: "Security review" (on draft_v1, parallel)
Main Agent
├─ Receives: feedback_facts
├─ Receives: feedback_voice
└─ Receives: feedback_security
→ Merge feedback, iterate once (or declare pass)
Why it matters: Instead of draft → check → audit → security (4 sequential steps), you draft once, then validate in parallel. 2x-3x faster.
When to use: Multi-stage workflows, quality gates, cross-functional checks.
Pro tip: Pass the draft to all validators in parallel. They'll naturally find different things. Merge feedback once.
Pattern 3: Recovery & Fallback
Your main task fails. Instead of giving up, spawn recovery sub-agents:
Main Agent
├─ Try: "Generate code for X"
│ └─ FAILS (model context too small)
├─ Spawn: "Simplify the requirements" (sub-agent A)
├─ Spawn: "Research alternative approaches" (sub-agent B)
└─ Spawn: "Check if this is the wrong problem" (sub-agent C)
Main Agent
├─ Receives: simplified requirements
├─ Receives: 3 alternatives
├─ Receives: problem analysis
→ Pick best path forward or escalate to human
Why it matters: You don't block. You gather intelligence. The main process can make a smarter decision with fresh perspectives.
When to use: Error recovery, dead-end detection, decision escalation.
Pattern 4: Batch Processing at Scale
You have 100 items to process (emails, support tickets, pull requests). Serial processing takes forever. Parallel batching:
Main Agent
├─ Spawn 10 sub-agents (batch size: 10 items each)
│ ├─ Sub-agent 1: Process items 1-10
│ ├─ Sub-agent 2: Process items 11-20
│ ├─ ...
│ └─ Sub-agent 10: Process items 91-100
├─ [Wait for all to complete]
└─ Aggregate results
Why it matters: 100 items at 1s each = 100s serial. Same 100 items × 10 workers = 10s parallel. You're bandwidth-limited by your LLM API, not your agent code.
When to use: Bulk classification, batch summaries, large-scale data transforms.
Constraint: Respect API rate limits. Anthropic/OpenAI throttle concurrent requests. Start with 3-5 workers, scale up if your quota allows.
How to Actually Do This
The framework matters less than the pattern. Here's how it looks in OpenClaw:
# Pseudo-code (language-agnostic)
def orchestrate():
# Spawn 3 parallel sub-agents
results = []
for task in ["research_A", "research_B", "research_C"]:
sub = spawn_subagent(
task=task,
model="haiku", # Cheap model for sub-agents
runtime="isolated" # Don't block main process
)
results.append(sub)
# Main process continues (can do other work here)
# Later, gather results
findings = [r.await() for r in results]
# Synthesize
summary = synthesize(findings)
return summary
Key choices:
- Cheap model for sub-agents (Haiku, Codex-mini). Reserve your expensive models (Sonnet, Opus) for the main agent.
- Isolated runtime. Each sub-agent gets its own sandbox, token budget, and error isolation.
- Don't block unless you have to. Spawn, continue, gather later.
When NOT to Use Sub-Agents
Sub-agents add latency and cost if misused:
- For tiny tasks (< 1 second LLM time). Overhead isn't worth it.
- When you need immediate context. If sub-agent results feed directly into the next step with no processing, the latency doesn't help.
- For sequential dependencies. If task B needs output from task A, you can't parallelize them. Don't force it.
The Real Win: Scalability
Sub-agents let you scale your AI agent's capabilities without hitting API limits or blocking on slow operations. You're trading single-process simplicity for multi-process resilience.
And here's the secret: it's not just about speed. It's about uncertainty handling. When you parallelize work, you naturally build in redundancy. If one sub-agent fails, the others complete. You can retry smartly. You can make better decisions with conflicting opinions.
This is how production AI systems think at scale.
Build This Yourself
Want a scaffold for this? Start with @webbywisp/create-ai-agent — it includes workspace templates with heartbeat automation and sub-agent patterns baked in. No boilerplate, just spawn and go.
Or dive deeper with the AI Agent Workspace Kit — opinionated templates for memory, orchestration, and production patterns.
Either way, stop building agents in a single process. Your future self will thank you.
Top comments (0)