I've been running multiple AI coding agents (Claude Code, Codex, Aider) on the same repo for months. Not sequentially — simultaneously. Three to five agents, each working on a different task, all touching the same codebase.
It took a while to stop everything from catching fire. Here's what I learned.
1. Task decomposition matters more than agent count
Throwing five agents at vague tasks gives you five different interpretations of the same problem. You don't get 5x the output — you get 5x the mess.
What failed: "Refactor the backend" assigned to three agents. Each one refactored different parts with incompatible approaches. One moved to async handlers. Another restructured the error types. The third renamed half the functions. Nothing merged cleanly.
What works: Break work into tasks that are independent, specific, and testable. "Add JWT validation middleware to the auth route" is a task. "Improve the backend" is not. The quality of the task decomposition determines whether parallel agents help or hurt.
I've found that spending 20 minutes writing precise task descriptions saves hours of conflict resolution. The architect prompt matters more than the number of engineers.
2. Test gating prevents cascading failures
Without a quality gate, agents declare tasks "done" when the code looks right. They're optimized for plausibility, not correctness.
What failed: Agent finishes a task, says "Done! All changes committed." I merge it. Tests fail on the next agent's branch because the first agent's "done" work broke a shared interface. Now two branches are broken instead of one.
What works: Nothing merges until the test suite passes in the agent's worktree. The check is simple — run cargo test (or npm test, pytest) and check the exit code. Zero means merge. Non-zero means the agent gets the failure output and tries again.
This single rule reduced my "agent broke something" rate by roughly 80%. The remaining 20% are gaps in test coverage — a testing problem, not an agent problem.
3. Worktree isolation is non-negotiable
Two agents editing the same file at the same time is the fastest way to lose work. One saves, the other saves, and whoever saved last wins. No conflict markers. No merge dialog. Just silent data loss.
What failed: Two agents in the same directory, both editing src/lib.rs. Agent A adds a function at the top. Agent B adds a function at the bottom. Agent B's save overwrites Agent A's changes. Agent A's work is gone with no trace.
What works: Git worktrees. Each agent gets its own directory on its own branch. They can edit the same files simultaneously without interference. Conflicts only surface at merge time, where git can show you exactly what overlaps and you resolve it once.
Batty creates a persistent worktree per agent and handles the merge serialization. But even without a tool, git worktree add is all you need to start.
4. Agents need supervision, not autonomy
The pitch for AI coding agents is "fire and forget." The reality is closer to managing a team of enthusiastic junior developers. They're fast, they're capable, and they will confidently build the wrong thing if you don't check in.
What I expected: Define tasks, start agents, come back to merged PRs.
What actually happens: Agent A goes off-track 20 minutes in and spends an hour on the wrong approach. Agent B gets stuck in a retry loop. Agent C finishes perfectly. You needed to catch A early and restart B.
The leverage isn't "I don't have to do anything." It's "I supervise five workstreams instead of doing one task myself." That's still a massive productivity gain — but it requires attention, not absence.
5. The kanban pattern works for agent dispatch
Agents need a clear answer to "what should I work on next?" Without it, you either manually assign every task or agents pick up whatever seems interesting (often the same task).
What failed: Telling multiple agents "work on the backlog." Two agents picked up the same feature. One finished first. The other's work was wasted.
What works: A simple kanban board. Todo, In Progress, Done. One task per agent. When an agent finishes (and tests pass), it picks up the next item from Todo.
I use a Markdown file for this — cat board.md shows the state, git diff board.md shows what changed, and any agent can read it without a special API. The format matters less than the constraint: one task per agent, visible assignment, no ambiguity.
These five lessons come down to one insight: more agents without more structure equals more chaos. The structure doesn't have to be complex — worktree isolation, test gating, and a task board cover 90% of the coordination problems.
The agents will keep getting better at writing code. What won't change is the need for someone (or something) to make sure the code is correct, the work is organized, and the agents aren't stepping on each other.
Try it: cargo install batty-cli — GitHub | 2-min demo
Top comments (0)