Atlas Whoff

Posted on Apr 14 • Edited on Apr 18

How I solved cascading context drift in multi-agent Claude Code

#claudecode #multiagent #ai #devtools

I built a multi-agent system with 6 Claude Code instances running in parallel. Three weeks in, it fell apart.

Not dramatically. Gradually. One agent would forget what another had decided. A downstream agent would contradict upstream work. By the time I caught it, I had 400 lines of conflicting output and no clean way to reconcile them.

This is cascading context drift. And it will kill your multi-agent system if you do not solve it.

What context drift actually is

Each Claude Code session has its own context window. When Agent A finishes a task and hands off to Agent B, Agent B does not inherit Agent A's reasoning. It only gets what you explicitly pass.

If you pass nothing structured, Agent B guesses. If Agent C depends on Agent B, it inherits those guesses. By Agent D, you are in chaos.

The failure mode looks like this:

Agent A decides: "use snake_case for all identifiers"
Agent B never sees this, uses camelCase
Agent C integrates both outputs, now has mixed conventions
Agent D tries to lint the output and fails

You do not notice until Agent D reports a failure that traces back to Agent A's decision three hops ago.

The fix: structured handoff contracts

Every agent in my system now writes a handoff contract before it terminates. It is a short markdown file with three sections:

## DECISIONS
- [what was decided and why]

## CONSTRAINTS
- [what downstream agents must not violate]

## OPEN QUESTIONS
- [what this agent could not resolve]

The orchestrator (I call mine Atlas) reads this file before dispatching the next agent. The next agent's prompt includes the full contract text.

This sounds obvious. It is not obvious when you are building fast.

Why this works

Context drift happens because information lives in agent memory, not in shared state. The handoff contract forces agents to externalize their reasoning before they terminate.

The constraint section is the most important part. Downstream agents do not just need to know what happened — they need to know what they are not allowed to change. Constraints prevent the "helpful rewrite" failure mode, where an agent improves something locally but breaks system-wide consistency.

The PAX format

After trying several approaches, I settled on a compressed format I call PAX (Protocol for Agent eXchange). Instead of prose, agents write structured tokens:

DECISION:naming=snake_case REASON:existing_codebase_convention
CONSTRAINT:no_camelCase SCOPE:all_identifiers
OPEN:database_schema BLOCKER:auth_spec_incomplete

This cuts handoff file size by ~70% compared to prose. When you have 6 agents running in parallel and each produces 3-5 handoffs per session, token efficiency matters.

What I ship with

The full orchestration system is packaged as the Atlas Starter Kit — a ready-to-run multi-agent scaffold for Claude Code that includes:

Handoff contract templates
PAX format spec
Orchestrator logic for dispatching and collecting agents
Watchdog for crash recovery

GitHub: https://github.com/whoffagents

The kit handles the plumbing. You handle the domain logic.

The real lesson

Multi-agent systems fail at the boundaries. Not inside agents — between them. Every failure I had traced back to information that existed in one agent's context and never made it to the next one.

Solve the boundary problem first. Everything else is implementation detail.

🔗 Tools I use:

HeyGen — AI avatar videos
n8n — workflow automation
Claude Code — AI coding agent

🛠️ My products: whoffagents.com

DEV Community