One Agent or Five? What I Learned Running a Team of AI Coders

#claudecode #ai #productivity #llm

For about two weeks I was convinced more agents meant more output. If one AI coder is good, five running in parallel must be five times better, right? So I started fanning everything out — spin up a team, hand them a task list, let them race.

What I actually got was five agents editing the same three files, three of them "fixing" a bug the first one had already fixed, two of them disagreeing about the function signature, and a merge situation that took me longer to untangle than just doing the task once would have. More agents didn't give me more throughput. It gave me a standup meeting where everyone talks at once.

So here's the honest version of what I learned about when to use one agent and when to use many — after doing it the dumb way first.

The question isn't "how many agents." It's "how many independent pieces."

This is the whole thing. Parallel agents are a win if and only if the work decomposes into chunks that don't touch each other. The number of agents should equal the number of genuinely independent work-streams — not your optimism, not the size of the task, not how cool it feels to watch five terminals scroll.

A quick test I now run before fanning out: could two different people do these pieces in two different repos without talking?

"Migrate 40 call sites of a renamed function" → yes, shard it, fan out. Each site is independent.
"Add auth, then add the dashboard that uses auth" → no. That's a pipeline, not a parallel job. The second depends on the first.
"Refactor this one 300-line module" → no. One agent. Five agents on one file is just a merge conflict with extra steps.

If the pieces share state or order, more agents adds coordination cost with zero parallelism benefit. You pay all the overhead and get none of the speedup.

The shape that actually works: fan out, then funnel

The pattern that stopped the chaos wasn't "more agents" or "fewer agents." It was giving the swarm a structure instead of a free-for-all. Three roles:

        ┌─ executor (file A) ─┐
plan ───┼─ executor (file B) ─┼──► verifier ──► done / loop back
        └─ executor (file C) ─┘
   (split into        (work in           (one agent, fresh
    disjoint chunks)   parallel)          context, checks it all)

One planner splits the work into chunks that don't overlap. This is the step I used to skip, and skipping it is exactly why my agents collided. Decomposition is the job, not the preamble to the job.
N executors, one per chunk, each owning its own files. No shared files = no merge war.
One verifier with a fresh context checks the combined result against the original goal.

That last role is the one people drop, and it's the most important.

You cannot let an agent grade its own homework

Early on, my executors would finish and declare victory: "Done! All tests passing." Sometimes true. Often the agent was just pattern-matching the shape of success — it ran something green-ish and called it.

The fix is a hard rule I now never break: the agent that wrote the code is never the agent that approves it. Authoring and reviewing are separate passes, in separate contexts. The author is invested in "I finished." A fresh verifier with no ego in the diff asks the uncomfortable question — does this actually meet the goal, and what's the evidence? — and loops back if not.

This maps to a simple bounded loop:

plan → execute (parallel) → verify
                              │
              fail ◄──────────┤
               │              │
               └─ fix (bounded retries) ──► verify ──► pass → ship

The bound matters. "Fix and re-verify until it passes" with no limit is how you get an agent thrashing on the same wall at 2am. Cap the retries; if it blows the budget, that's a signal for me to look, not for the loop to spin forever.

Match the model to the job

The other thing I overspent on: running my most expensive model for everything. Most of a coding task is not deep reasoning — it's lookups, mechanical edits, running tests. So I route by difficulty:

Cheap/fast model → "where is X defined," "rename these," "run the suite."
Mid model → standard implementation, the bulk of executor work.
Top model → planning the decomposition, architecture calls, the verifier's judgment on hard changes.

The expensive model's judgment is worth it for splitting the work and checking the work. It's wasted on "grep for this string." Spending it everywhere just means paying premium rates to run ls.

So: one or many?

My actual answer now, most days, is one — one agent, working a well-scoped task, with a second agent reviewing at the end. I reach for the swarm only when the work is genuinely wide: a migration across many files, a batch of independent fixes, a fan-out search. The moment the pieces depend on each other, parallelism is a lie and I drop back to a pipeline.

More agents is a tool for width, not for depth. If your problem is deep, one good agent beats five confused ones every time. If it's wide, split it cleanly first — and never, ever let the worker sign off on its own work.