Multi-agent systems are having a moment. Kimi K2.6 ships with Claw Groups supporting 300 parallel sub-agents. Hermes Agent crossed 100K GitHub stars in under two months. Everyone is building agents that spawn agents.
Nobody is asking if they should.
The dirty secret is that coordination is harder than execution. You can get two LLMs to call tools. Getting them to share context, resolve conflicts, and not duplicate work requires infrastructure most teams have not built. The result? Agents that talk past each other, replay the same API calls three times, and occasionally deadlock.
I have spent the last year debugging multi-agent pipelines in production. The failures are not dramatic. They are subtle: context windows stuffed with redundant summaries, tool traces that do not propagate, final answers that contradict each other. These are not model problems. They are architecture problems.
The Three Coordination Failures
First: state bleed. Most agent frameworks treat memory as a shared dump. Every sub-agent appends to the same context window. By the third agent, your prompt is 80% noise. The cleaner approach: stateless ephemeral units with explicit skip_memory flags. The parent decides what the child needs to know.
Second: failure isolation. When one agent hangs on a long API call, what happens to the others? In most setups: everything waits. There is no circuit breaker. Kimi K2.6's claim of 12+ hour continuous runs with 4,000+ tool calls sounds impressive until you realize there is no graceful degradation path.
Third: plan consensus. Multiple agents need to agree on a plan. Most implementations punt this to an LLM judge. That is fine until the judge hallucinates a step that never happened. The better pattern: strict planning constraints with deterministic checkpoints. Agents do not negotiate. They follow a DAG.
What Actually Works
Explicit handoffs, not shared memory. Each agent receives exactly what it needs, validates it, and returns structured output. No browsing a shared blob hoping your context survived.
Structured failure metadata. When agents fail, they return status, exit reason, tool trace, retry count. This lets the orchestrator decide: retry, escalate, or abort.
Deterministic replay. Multi-agent systems are non-deterministic by default. For debugging, you need the ability to replay exactly: which agent ran when, what tools it called, what context it saw.
The Infrastructure Gap
Here is what is missing: a runtime treating agents as first-class citizens. Not just a harness calling agent.run(), but something understanding agent lifecycles, health monitoring, persistence, permissions. We have this for containers. We do not have it for agents.
The pattern emerging in Hermes Agent's ecosystem: agents as ephemeral compute units with explicit contracts. They declare inputs, outputs, side effects. The orchestrator schedules them like jobs. This is how you get to 300 parallel agents without chaos: treat them like cattle, not pets.
Google's Skills in Chrome inverts the model: many small agents each doing one thing well, triggered by explicit user intent. No coordination needed because the user is the coordinator. Less autonomy, more reliability.
The Hard Part
The real challenge is not technical. It is organizational. Teams want agents that just work together. They do not want to design coordination protocols. So they use frameworks that paper over complexity, until it leaks.
The future is not smarter individual agents. It is agent collectives that actually coordinate. We are not there yet.
Top comments (0)