Why Multi-Agent Matters for Code Development
Single-agent AI has a ceiling. When you ask one Claude instance to analyze an entire codebase, write tests, and generate documentation at once, you hit two hard walls:
Context exhaustion — large codebases blow past the context window, forcing truncation and degrading quality.
Sequential bottlenecks — each task waits for the previous one to finish. Analyzing 10 files one-by-one takes 10x longer than analyzing them in parallel.
Multi-agent architecture solves both. Instead of one overloaded agent, you run a team: an orchestrator that divides work and workers that execute tasks concurrently. Real-world result: tasks that took 8 minutes sequentially finish in under 90 seconds with parallel workers.
The Agent Tool Pattern
Claude Code's Agent tool lets you spawn subagents programmatically from within a skill or prompt. Each subagent runs with its own context window, tools, and model.
Model Selection
The key to cost-efficient multi-agent systems is using the right model for each role:
| Role | Model | Use Case |
|---|---|---|
| Orchestrator | opus |
Task decomposition, final synthesis, design decisions |
| Worker (standard) | sonnet |
Implementation, code analysis, test writing |
| Worker (lightweight) | haiku |
File search, grep, status checks, simple transforms |
Spawning Subagents
In a SKILL.md or orchestrator prompt, you direct Claude Code to use the Agent tool like this:
Use the Agent tool to spawn 3 parallel workers:
- Worker 1 (model: haiku): list all Python files in src/
- Worker 2 (model: sonnet): analyze src/api/ for security issues
- Worker 3 (model: sonnet): generate unit tests for src/models/
Wait for all 3 to complete, then synthesize results.
The run_in_background parameter lets non-blocking workers run without blocking the orchestrator's main thread — useful when you want to kick off slow tasks and check results later.
Three Practical Patterns
Pattern 1: Parallel Research
Problem: You need to understand 20 files before refactoring. Reading them sequentially wastes time.
Solution: Launch N Haiku workers, one per file or URL, and consolidate findings in the orchestrator.
Orchestrator (Opus):
├── Agent(haiku): read and summarize src/auth/login.py
├── Agent(haiku): read and summarize src/auth/session.py
├── Agent(haiku): read and summarize src/auth/middleware.py
└── ... (all parallel)
→ Synthesize: "Auth module uses JWT with 3 known issues: ..."
Each Haiku agent is cheap and fast. The orchestrator only pays Opus pricing for the final synthesis — the expensive thinking step.
Pattern 2: Parallel Code Generation
Problem: You need a new feature with frontend, backend, and tests. Sequential generation means the test writer is blocked until backend is done.
Solution: Each component is independent. Run them concurrently.
Orchestrator (Opus):
Designs interfaces and passes specs to workers →
├── Agent(sonnet): implement REST API endpoint (spec: ...)
├── Agent(sonnet): implement React component (spec: ...)
└── Agent(sonnet): write integration tests (spec: ...)
→ All three work simultaneously, no context collision
Each worker sees only its own spec — no risk of one agent's partial output contaminating another's context.
Pattern 3: Pipeline Execution
Problem: Some tasks have strict ordering — you can't test code that doesn't exist yet.
Solution: Chain agents, passing structured results between stages.
Stage 1: Agent A (Sonnet) analyzes the codebase
→ outputs: JSON list of issues + affected files
Stage 2: Agent B (Sonnet) reads Agent A's output, implements fixes
→ outputs: patch diff
Stage 3: Agent C (Haiku) runs linter + test commands, reports pass/fail
Each stage is isolated. If Stage 2 fails, you restart only Stage 2 — not the entire pipeline.
Real Example: Parallel Security Audit
This is exactly how the Security Pack's /security-audit skill works.
OWASP has 10 top risk categories (A01-A10): Broken Access Control, Cryptographic Failures, Injection, Insecure Design, and so on. Auditing all 10 axes sequentially in a large codebase takes 15+ minutes.
With parallel agents:
Orchestrator (Opus):
├── Agent(sonnet): scan for A01 — Broken Access Control patterns
├── Agent(sonnet): scan for A02 — Cryptographic Failures (weak hashes, plain storage)
├── Agent(sonnet): scan for A03 — Injection (SQL, command, LDAP)
├── Agent(sonnet): scan for A04 — Insecure Design (missing rate limits, no input bounds)
├── Agent(haiku): scan for A05 — Security Misconfiguration (debug flags, default creds)
├── Agent(haiku): scan for A06 — Vulnerable Dependencies (requirements.txt audit)
├── Agent(sonnet): scan for A07 — Auth failures (session fixation, weak passwords)
├── Agent(haiku): scan for A08 — Data Integrity (unsigned packages, missing checksums)
├── Agent(haiku): scan for A09 — Logging gaps (missing audit trail, sensitive data in logs)
└── Agent(haiku): scan for A10 — SSRF (unvalidated URLs, internal service calls)
→ Opus consolidates: severity-ranked report with file:line references
Result: full OWASP audit in ~2 minutes instead of 15. The Opus orchestrator only runs once at start and once at the end — workers do the heavy scanning at Sonnet/Haiku rates.
Cost Optimization
Multi-agent does not mean higher cost. The opposite is true when you assign models correctly.
Rule: pay Opus prices only for decisions that require Opus-level reasoning.
Expensive (use sparingly):
Opus → architecture decisions, security policy, final report synthesis
Moderate (most implementation):
Sonnet → code analysis, file reading, implementation, test writing
Cheap (bulk tasks):
Haiku → grep searches, file listing, format checks, simple transforms
A typical 10-worker parallel audit might use Opus for 200 input tokens (the orchestration prompt) and 500 output tokens (the final report), while 8 Haiku workers and 2 Sonnet workers handle all the scanning. The marginal cost per audit run is well under $0.05.
Compare that to a single Opus agent trying to do everything in one massive context — you'd pay 10-20x more for slower, lower-quality results.
Getting Started
You can implement multi-agent patterns today without any special setup. Just write orchestration instructions in your SKILL.md or Claude Code prompts using natural language — Claude Code handles spawning, context isolation, and result aggregation.
The learning curve is in task decomposition: identifying which subtasks are truly independent (safe to parallelize) vs. which have data dependencies (must be pipelined).
Start with Pattern 1 (parallel research) — it's zero-risk and immediately shows 5-10x speedups on any codebase exploration task.
I've pre-built multi-agent skills for security auditing, code review, and dependency checking in the Security Pack — available at PromptWorks for ¥1,480. Drop-in SKILL.md files, no setup required.
Top comments (0)