Batty

Posted on Apr 4

The Real Cost of Running 5 AI Coding Agents in Parallel

#ai #devtools #productivity #tutorial

"Doesn't running 5 agents at once cost 5x as much?"

Yes and no. The token costs are real, but the math isn't as simple as multiplying by the number of agents. Here's what actually happens when you scale from one agent to five — and how to keep the cost sane.

The Baseline: One Agent

A typical Claude Code session on a medium-sized codebase:

Startup context: 50-180K tokens (loads project files, CLAUDE.md, recent git history)
Per-task tokens: 20-80K (depending on task complexity)
Total per session: 70-260K tokens

At Anthropic's API pricing for Opus 4.6, that's roughly $0.50-$2.00 per task. For simple tasks it's less. For complex refactors with multiple iterations, it can be more.

The Naive Math: 5 Agents = 5x Cost

If you just launch 5 agents and let them run:

5 × startup context: 250-900K tokens just loading the project
5 × per-task: 100-400K tokens for actual work
Total: 350K-1.3M tokens per round

That's $2.50-$10+ before anyone's written useful code. The startup context is the killer — each agent independently loads the entire project, and most of that loaded context is irrelevant to the specific task.

The Reality: Smart Coordination Costs Less

With proper task scoping and agent management:

1. Narrow the context window

Each agent doesn't need to read everything. A .claudeignore file that excludes irrelevant directories cuts startup context by 40-70%:

# .claudeignore
node_modules/
target/
*.lock
docs/
tests/fixtures/

An agent working on the auth module doesn't need to load the entire frontend. Startup drops from 180K to 50-80K tokens per agent.

2. Scope tasks tightly

"Refactor the backend" loads the entire codebase into context. "Add JWT validation to the auth middleware" loads three files. The difference in token consumption is 10x.

Well-decomposed tasks don't just produce better code — they're dramatically cheaper because the agent reads less and writes less before reaching the answer.

3. Reset sessions between tasks

Don't carry a 200K-token conversation history into the next task. When an agent finishes, kill the session and start fresh. The new session loads only the context needed for the next task, not the accumulated history of everything the agent has seen.

This is counterintuitive — doesn't the agent lose context? Yes, but the per-task context from the CLAUDE.md and task description is usually enough. The accumulated conversation history is mostly noise from the previous task.

4. Mix models by role

Not every role needs the most expensive model:

Role	Model	Why
Architect (planning)	Opus 4.6	Needs deep reasoning for task decomposition
Engineers (execution)	Codex / Sonnet	Faster, cheaper, good enough for scoped tasks
Reviewer	Opus 4.6	Needs judgment for code quality assessment

Using Codex for execution at roughly 1/3 the cost of Opus cuts your engineer token costs significantly. The architect and reviewer — which run less frequently — use the expensive model where reasoning quality matters.

5. Restrict agent communication

Without communication constraints, 5 agents create up to 20 communication channels (n × (n-1)). Each message burns tokens on both the sender and receiver.

Restricting communication to a hierarchy (engineers → manager → architect) keeps the channel count to about 8 for a 6-agent team. Linear cost scaling instead of quadratic.

Realistic Cost Comparison

For a 4-hour work session with 5 tasks:

Approach	Tokens	Estimated Cost	Time
Sequential (1 agent)	400K-1M	$3-$8	4 hours
Naive parallel (5 agents)	1.5M-4M	$12-$30	1 hour
Optimized parallel (5 agents)	600K-1.5M	$5-$12	1 hour

The optimized parallel approach costs 1.5-2x what sequential costs, but finishes in 1/4 of the time. The cost per task is roughly the same — you're just running them concurrently.

The naive approach costs 3-4x because every agent loads everything and communicates with everyone. The optimization is in what you don't load and who doesn't talk to whom.

When Parallel Agents Save Money

Parallel agents are cost-effective when:

Tasks are independent. No shared state means no coordination overhead.
Your time has value. If you're a $150/hour contractor, saving 3 hours is worth $450 — far more than the $5-10 extra in token costs.
Tasks are well-decomposed. Narrow scope = narrow context = fewer tokens.

Parallel agents waste money when:

Tasks are coupled. Agents waiting on each other's output burn tokens on idle context.
Decomposition is poor. Vague tasks cause agents to explore broadly, reading (and billing for) irrelevant code.
You're not supervising. An agent stuck in a retry loop can burn through $20 of tokens before you notice.

The Supervision Tax

The supervision layer itself (the daemon, the kanban, the message routing) adds zero token cost. Batty runs locally — it polls tmux panes, reads files, and manages state. No API calls, no token consumption. The only tokens consumed are by the agents themselves.

The talks_to constraint is the most cost-effective feature: it prevents the O(n²) communication explosion that turns "5 agents" into "20 conversations."

Bottom Line

Running 5 agents in parallel doesn't cost 5x. With proper scoping, session resets, model mixing, and communication constraints, it costs 1.5-2x while finishing 4x faster. The optimization isn't in the agents — it's in the coordination layer that controls what they load, who they talk to, and when they stop.

Try it: cargo install batty-cli — GitHub | Demo

DEV Community