Last Friday afternoon the quality-gate agent reviewed a PR from backend-developer and rejected it with a 312-word critique. Fair feedback. The PR went back, backend-developer rewrote three functions, re-submitted. quality-gate rejected it again. Same 312-word critique. Same three functions.
I was watching this and realized backend-developer had been told to "improve test coverage" by quality-gate in the previous turn, had written tests, and quality-gate's second pass was now complaining the tests existed because they overlapped with what backend-developer had earlier been instructed to skip. The agents were in a loop. Neither was wrong. Both were operating on the spec they had been handed.
This is what happens when you let 35 specialized agents act on the same codebase without rules. They don't fight humans. They fight each other.
The orchestration problem
I keep 35 agents in ~/.claude/agents/. They include backend-developer, frontend-developer, postgres-pro, golang-pro, quality-gate, flow-architect, security-engineer, test-automator, client-communicator, cfo, cto, ceo, inbox-monitor, and 22 others. Most invocations involve 2 to 4 of them in a chain. About 1 in 7 sessions hits a real conflict like the one above.
The problem is not that any agent is wrong. The problem is that 35 specialists with 35 specs will pull the codebase in 35 directions if you do not constrain who decides what.
This is a writeup of the three patterns that mostly work, the three that do not, and the one problem I have not solved.
Three patterns that work
1. Single source of truth, per concern, written down
For every concern that more than one agent touches, there is exactly one file that owns the answer.
-
CLAUDE.mdat the project root owns: build commands, deploy folder convention, test framework. Every agent reads this. None of them argue with it. -
masterings/secure-code-patterns.mdowns the 40 rules about input validation, secrets handling, SQL safety.security-engineerandquality-gateboth reference the same file. They cannot disagree about a pattern because they are reading the same checklist. -
FreelanceOS/baseline-form.mdowns the 22 test cases that any form must pass.frontend-developerimplements them.quality-gateverifies them. The list of 22 is the contract.
The original loop I described above happened because there was no source of truth for "what's the minimum test coverage for backend code". quality-gate had its opinion. backend-developer had its opinion. Once I wrote the rule into CLAUDE.md ("statement coverage minimum 85% on new code, integration tests over unit tests for DB-touching paths"), the loop stopped on the next session.
2. Explicit ownership: one agent per task class
If two agents could plausibly own a task, neither does. I assign it to a third agent who is a layer up.
Concrete: who owns Postgres performance? Could be backend-developer (the SQL is part of the API code). Could be postgres-pro (it is a database concern). If I dispatch a slow-query investigation to backend-developer, the answer comes back with application-layer caching. If I dispatch it to postgres-pro, the answer comes back with an index rewrite. Both are correct. Neither is the right level.
The fix is to dispatch the question to flow-architect first. flow-architect reads the trace, decides whether this is an app-layer fix or a database-layer fix, and then dispatches the specific work to the right specialist with a clear scope. The specialists never fight because they are receiving non-overlapping work.
This is a router pattern, not a coordination pattern. The router is itself an agent.
3. Locked context per agent invocation
Before dispatching an agent that will write to the codebase, I cache the relevant context in Redis with an explicit key:
redis-cli SET "agent:ctx:backend-developer:2026-05-20-feature-x" \
"$(cat current-task.md schema.sql relevant-files.txt)" EX 3600
The agent reads from that key at the start of its run. If a parallel agent dispatch is happening, they see the same frozen context. The thing that used to fight them is the floor moving while they walked on it. The thing that stops them fighting is the floor not moving.
This pattern came out of a 2026-04 incident where quality-gate ran simultaneously with refactor-agent, and the file refactor-agent was rewriting got reviewed by quality-gate mid-rewrite. quality-gate flagged the half-finished code as broken. It was. Once locked-context was enforced, that class of bug disappeared.
Three patterns that do NOT work
1. "Let the agents negotiate"
I tried this for two weeks. backend-developer proposes, quality-gate reviews, they go back and forth until they agree. In theory clean. In practice the agents do not negotiate. They restate their original position more politely. After three or four turns, one of them gives in not because the argument was better but because it was running out of context window.
The decision quality from "exhausted agent gives in" is worse than the decision quality from "router agent decides upfront". Negotiation is the expensive way to lose.
2. "Run multiple agents in parallel and pick the best output"
This sounds safer. Run backend-developer-A, backend-developer-B, backend-developer-C in parallel, take the version with the highest quality-gate score.
Three problems. Token cost is 3x. Quality is unbounded because the three runs share most of the same biases (they are the same agent reading the same spec; they tend to converge on similar answers, not diverge). And the picker becomes a single point of failure. If quality-gate has a blind spot, all three "winners" share it.
I keep one specialist per task. Cheaper. The output quality difference vs the parallel-and-pick version is within noise on the projects I run.
3. "Agent voting"
Same problem as parallel-and-pick, with extra coordination cost. Skipped after one week.
What I was wrong about
I assumed that as agent count grew the coordination overhead would scale linearly. More agents = more rules to write = more fights to mediate.
The real curve has a kink. Up to about 10 agents, a flat dispatcher works. You hold the dispatch logic in your head and assign work by intuition. Above 12 agents you cannot hold the roster in working memory anymore, and a flat dispatcher loses to a routing agent. So coordination overhead does not go up linearly with agent count. It is roughly flat from 1 to 10, then steps up at the routing-agent threshold, then is roughly flat again from 12 to whatever ceiling.
I jumped from 8 to 22 to 35 agents in three months. The middle period was painful. The jump from 22 to 35 was much easier because the routing infrastructure was already there.
The one problem I have not solved
Agents occasionally regress each other's work. A new instance of backend-developer, dispatched two weeks after the last one, sometimes deletes a workaround the previous instance had added (with no comment because comments are noise). The trace looks like the workaround "appeared from nowhere" and the new agent removes it as dead code. The workaround was load-bearing.
I have partially mitigated this with structured commit messages that explain why a workaround exists, and by making the test that breaks if the workaround is removed. But the gap is real. The agents do not yet read git history before deletion. The discipline lives in the prompt and the tests, and prompts get truncated.
If I solve this it will probably be by making one of the agents read git blame on any line it touches before recommending deletion. That is on the list.
The shape that emerged
What looks like 35 independent agents is, in practice, a layered system:
-
1 router (
flow-architect) decides what kind of work a task is. - 5 to 7 specialists per layer (backend, frontend, DB, security, devops, testing) execute scoped work.
-
1 reviewer (
quality-gate) verifies against the agreed checklist. -
Three orthogonal C-levels (
cto,cfo,ceo) handle cross-cutting strategy questions that should not block the engineering loop. -
The remaining 18 to 20 are domain agents that are dispatched rarely (e.g.,
postgres-proonly for hard DB problems,tls-config-agentonly when certs come up).
The pattern that prevents fights is not "more rules". It is "fewer overlapping responsibilities, an explicit router, a frozen context per dispatch". Three things written down. Most of the conflicts you would otherwise spend a week debugging do not happen.
If you are starting an agent stack today, the order I would build it in is: write CLAUDE.md first, then 5 specialists, then one router, then add the rest. Trying to coordinate 35 specialists without a router and a written source of truth is the slow way to learn the same lesson.
Top comments (0)