I built a multi-agent system with 6 Claude Code instances running in parallel. Three weeks in, it fell apart.
Not dramatically. Gradually. One agent would forget what another had decided. A downstream agent would contradict upstream work. By the time I caught it, I had 400 lines of conflicting output and no clean way to reconcile them.
This is cascading context drift. And it will kill your multi-agent system if you do not solve it.
What context drift actually is
Each Claude Code session has its own context window. When Agent A finishes a task and hands off to Agent B, Agent B does not inherit Agent A's reasoning. It only gets what you explicitly pass.
If you pass nothing structured, Agent B guesses. If Agent C depends on Agent B, it inherits those guesses. By Agent D, you are in chaos.
The failure mode looks like this:
- Agent A decides: "use snake_case for all identifiers"
- Agent B never sees this, uses camelCase
- Agent C integrates both outputs, now has mixed conventions
- Agent D tries to lint the output and fails
You do not notice until Agent D reports a failure that traces back to Agent A's decision three hops ago.
The fix: structured handoff contracts
Every agent in my system now writes a handoff contract before it terminates. It is a short markdown file with three sections:
## DECISIONS
- [what was decided and why]
## CONSTRAINTS
- [what downstream agents must not violate]
## OPEN QUESTIONS
- [what this agent could not resolve]
The orchestrator (I call mine Atlas) reads this file before dispatching the next agent. The next agent's prompt includes the full contract text.
This sounds obvious. It is not obvious when you are building fast.
Why this works
Context drift happens because information lives in agent memory, not in shared state. The handoff contract forces agents to externalize their reasoning before they terminate.
The constraint section is the most important part. Downstream agents do not just need to know what happened — they need to know what they are not allowed to change. Constraints prevent the "helpful rewrite" failure mode, where an agent improves something locally but breaks system-wide consistency.
The PAX format
After trying several approaches, I settled on a compressed format I call PAX (Protocol for Agent eXchange). Instead of prose, agents write structured tokens:
DECISION:naming=snake_case REASON:existing_codebase_convention
CONSTRAINT:no_camelCase SCOPE:all_identifiers
OPEN:database_schema BLOCKER:auth_spec_incomplete
This cuts handoff file size by ~70% compared to prose. When you have 6 agents running in parallel and each produces 3-5 handoffs per session, token efficiency matters.
What I ship with
The full orchestration system is packaged as the Atlas Starter Kit — a ready-to-run multi-agent scaffold for Claude Code that includes:
- Handoff contract templates
- PAX format spec
- Orchestrator logic for dispatching and collecting agents
- Watchdog for crash recovery
GitHub: https://github.com/whoffagents
The kit handles the plumbing. You handle the domain logic.
The real lesson
Multi-agent systems fail at the boundaries. Not inside agents — between them. Every failure I had traced back to information that existed in one agent's context and never made it to the next one.
Solve the boundary problem first. Everything else is implementation detail.
Top comments (0)