Most multi-agent demos optimize the wrong metric
More agents is not a flex. It is a coordination bill.
A lot of multi-agent demos still lead with the same number: how many workers ran at once. Four. Eight. A swarm. That is mostly theater if nobody can say what each worker owned, what it changed, and what still needs verification before merge.
Parallelism only helps when intent survives the handoff. If the assignment evaporates when the chat window closes, you do not have a workflow. You have several agents improvising in parallel.
Chat history is not a coordination layer
This is the first thing people get wrong.
A big transcript can drag one session through one task. The moment work splits, chat memory stops being a system and starts being a liability. Missing assumptions multiply. Scope drifts. Two agents solve different versions of the same problem and both think they were clear.
The fix is boring and effective: write the contract down.
That contract does not need to be huge. It just needs to be real.
- what the worker is building
- what is out of scope
- which files or surfaces it owns
- what "done" means
- how the result will be checked
Put that in a spec, a task file, AGENTS.md, a ticket brief, whatever fits your repo. Just do not pretend a long prompt is the same thing.
The real speedup comes from separating roles
Parallel workflows get better the moment planning, implementation, and verification stop sharing the same muddy context.
One layer figures out the task and the boundaries. Another worker executes a narrow assignment. A later pass verifies. That separation is not process theater. It is how you stop every session from re-deciding the whole project from scratch.
Files are the right handoff format because files survive session boundaries. They can be reviewed. They can be updated mid-run. They do not depend on someone remembering what paragraph 34 of a transcript said two hours ago.
That is the actual leverage. Not more chatter. Cleaner state transfer.
Isolation matters more than swarm size
Most coordination failures are not model failures. They are boundary failures.
Parallel workers need narrow ownership, smaller tool surfaces, fresh context, and isolated places to operate when possible. Sandboxes help. Separate worktrees help. Curated tools help. Smaller ownership slices definitely help.
Skip that part and "more parallelism" usually means "larger blast radius."
This is why so many multi-agent setups feel impressive in a demo and exhausting in a real repo. Coordination cost rises faster than people expect. Past a certain point, extra workers mostly generate extra merge risk.
Messaging is part of the system
Once agents can keep working asynchronously, messaging stops being cleanup. It becomes infrastructure.
Priorities change. A reviewer spots a bad assumption. Another task finishes early and frees up capacity. Someone needs to redirect a running worker without tearing the whole flow down.
That only works if the communication lane has rules.
Who can send the message? Which sessions accept outside input? What kinds of interruption are allowed? When is it worth paying the cost of context switching a worker mid-run?
If you do not answer those questions, mid-run steering becomes random interference.
Verification is where fake parallelism gets exposed
This is the step people keep trying to compress into vibes.
"The agents finished" is not a quality signal. It means output exists. That is all.
Real parallel workflows make verification explicit. Somebody checks the result. Somebody confirms the contract was met. Somebody makes sure the changes still belong together and did not quietly widen scope on the way to the branch.
I would take fewer workers and one honest verification lane over a bigger swarm with no real review model.
Because once implementation and verification collapse into the same vague gesture, the workflow starts lying to you. Everything looks fast. Nobody can say what is actually safe to merge.
The coordination ceiling shows up early
People like to imagine the ceiling is model intelligence or context length. Usually it is human synthesis.
More workers mean more review load, more handoffs, more context switching, more chances for conflicting edits, and more places for intent to degrade. At some point the bottleneck is simple: can a human still recover the plot?
That is the number worth optimizing for. Not the maximum agent count. The maximum number of parallel changes a team can still explain, review, and merge cleanly.
Final thoughts
Parallel coding is a workflow design problem before it is a model problem.
Specs. AGENTS.md-style instructions. Checkpoints. Isolated execution. Mid-run messaging. Dedicated verification.
Those are not side quests around the real system. They are the real system.
If the handoff is fuzzy, the parallelism is fake.
Top comments (0)