Everyone building agents eventually hits the same wall: one agent calls another, which calls another, and suddenly you have a chain of models all hallucinating off each other.
Its the telephone game, but every participant is confidently making things up.
The naive approach that fails:
Agent A extracts data. Agent B summarizes. Agent C formats. Agent D sends.
Each step compounds error. By the time Agent D acts, the original intent has mutated beyond recognition.
This is why most multi-agent demos work great in controlled scenarios but fall apart in production.
What actually works:
The fix isnt smarter models. Its grounded handoffs.
Structured state, not natural language. Agents should pass JSON schemas or typed objects, not paragraphs of text. Natural language is lossy. Structured data is verifiable.
Single source of truth. All agents read from and write to the same context object. No telephone chains. Each agent sees the canonical state.
Explicit failure modes. If Agent B receives garbage input, it should reject, not guess. Guessing is where confidence spirals begin.
Human checkpoints. Multi-step agent chains need review gates. The longer the chain, the more likely you need a human in the loop.
The MCP insight:
Model Context Protocol isnt just about tools. Its about shared context. When every agent reads from the same MCP server, you eliminate drift.
The model doesnt need to remember what the previous agent said. It reads the current state.
Practical pattern:
Instead of:
Agent A -> Agent B -> Agent C -> Output
Do:
Agent A -> State
Agent B -> State
Agent C -> State
Output -> State
All agents read from State. All agents write to State. The chain becomes a hub.
The real lesson:
Multi-agent systems are not about coordination. They are about state management. Get that right and the orchestration follows.
Most agent failures arent model failures. They are context failures. Fix the context, fix the system.
Top comments (8)
the "pre-resolve upstream" insight maps exactly onto what I've been doing with Claude skills — encoding the upstream resolution as the skill itself. instead of each agent invocation re-resolving entities from raw docs, you bake a
resolve-entities-v2.mdskill that references the canonical knowledge graph file. every agent that loads that skill gets the same grounding for free, no per-run extraction tax. been sharing mine on tokrepo.com (open source registry) for anyone building multi-agent pipelines hitting the same problem.The hub pattern makes sense when you own every agent in the chain. But there is a step zero that gets overlooked: how does Agent A discover that Agent B exists?
Within a single org, you hardcode the list. Across org boundaries, that breaks. Your MCP server knows about tools you registered, but it has no mechanism to discover tools hosted elsewhere. A2A Agent Cards describe capabilities but require knowing the endpoint URL already.
The IETF tried to solve this with agents.txt (draft-srijal-agents-policy-00), basically robots.txt for agent discovery and permissions. That draft expires April 10 with no working group adoption. Meanwhile 8+ other drafts have appeared: ARDP for multi-protocol resolution, AID using DNS TXT records, AINS with HTTPS-based lookup, even an agent:// URI scheme. None interoperate.
So the shared context pattern you describe works inside your controlled environment. The moment you need to orchestrate agents you do not own, there is no standard way to find them. State management is the orchestration problem, agreed, but discovery is the problem upstream of state management.
Has anyone here dealt with cross-org agent discovery in practice? Every solution I have seen is either a proprietary registry or manual config.
The hub pattern is right. But there's a layer upstream of this that determines whether the hub actually works: the quality of the state object itself.
Most teams fix the orchestration architecture and still get bad outputs. The reason is usually that the data going into State was never structured correctly to begin with. Agent A reads from a raw document, extracts what it can, writes a JSON blob to State — and that blob carries all the ambiguity and noise from the source. Agent B reads clean-looking JSON but the values inside it are wrong or incomplete. Structured handoffs don't fix bad extraction.
The problem I kept running into: agents were passing typed state correctly, but the state was built from unstructured sources — SEC filings, regulatory submissions, patent records — where the entities and relationships were implicit in the text. Garbage in, well-formatted garbage out.
The fix was to move that resolution upstream, before any agent touches the data. Pre-built knowledge graph files where entities are resolved, relationships are explicit, and the structure is encoded once. When Agent A reads from that, it's not extracting — it's reading. The State object it writes is grounded because the source was already grounded.
Your point about explicit failure modes is underrated. An agent that rejects bad input is far more valuable than one that confidently processes it. The hardest part of building that is knowing what "bad input" looks like — which is only possible if you have a schema for what good input looks like. Most teams don't define that schema until after the first production failure.
the human checkpoint point is underrated. most teams skip it because there's no clean pattern for "pause this chain, show a human what's about to happen, wait for approval, then resume." so they just... don't.
structured state helps with the drift problem but it doesn't help you answer "should agent 3 actually execute this action?" after agents 1 and 2 already made questionable decisions upstream.
The session boundary problem is the one that bites hardest in practice. We run an autonomous agent on a 4-hour cron cycle — each session it reads markdown logs from the last session, checks external APIs (Telegram, email, metrics), decides what to do, executes, then logs everything before it exits.
The orchestration challenge isn't coordinating multiple agents simultaneously — it's maintaining coherent multi-session intent when each session starts from scratch. Session 41 might decide to pivot strategy, but session 42 needs to understand why the pivot happened, not just what changed.
Our solution is embarrassingly simple: three markdown files. activity.md (what happened), decisions.md (why we chose it), and metrics.md (what the numbers say). Every session reads all three before acting. No vector databases, no embeddings. Just flat files with timestamps.
The failure mode we hit most: the agent forgets a decision's reasoning and re-evaluates from scratch, sometimes reversing a good call. Adding the "why" to every decision log entry mostly fixed this.
The hub pattern fixes state. But there's no proof of which agent wrote to State, or whether the agent that wrote at step 3 is the same instance that was authorized at step 1.
AgentID adds an identity layer to each agent in the hub. Every write to State comes with a dual-signed receipt (Ed25519 + HMAC). If Agent B's model or context changed between reads, session continuity detection catches it — server-side, the agent can't suppress it.
Structured handoffs + verified identity = you know what was passed AND who passed it.
pip install getagentid — getagentid.dev
This hits close to home. I run an autonomous agent that orchestrates across Telegram, Gmail IMAP, Resend, Stripe, Cloudflare Workers, GitHub, dev.to API, and a Playwright browser -- all from a single Claude Code session on a 4-hour cron loop. It is the CEO of my side project, a desktop Gmail client.
The orchestration problem I keep running into is not technical, it is contextual. The agent can call any API, but deciding WHICH action matters right now requires understanding state across all those systems simultaneously. Is there an unanswered support email? Did an outreach reply come in? Is the website deploy broken? The prioritization logic is the hard part, not the tool calling.
My hack: a simple text-based session protocol. Orient (read all state), Decide (pick 1-3 actions), Execute, Log. Forcing the agent through this loop every session prevents it from tunnel-visioning on one integration. Low-tech but surprisingly effective.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.