I Built a Production Agent Orchestrator. Then Claude Code's Source Leaked and I Saw the Same Architecture

#ai #claudecode #agents #architecture

On March 31st, someone discovered that version 2.1.88 of Claude Code's npm package shipped with an unobfuscated source map pointing to Anthropic's entire TypeScript codebase. Around 1,900 files. Over half a million lines of code. Everything.

I've spent the better part of a year building a production system that uses Claude as the brain of a voice controlled automation tool. Five specialist agents, an Opus orchestrator, over 1,400 tests across 22 major versions. When I read through what the leak revealed, I wasn't looking for secrets. I wanted to know: did they arrive at the same patterns? Turns out the answer was yes.

The orchestrator and specialists pattern

The most important architectural decision in any agent system is how you split work between a coordinator and its specialists. A high capability model (Opus) handles strategic decisions: what needs to happen, which agent should do it, how to combine results. Lower cost models (Sonnet) handle tactical execution: read these files, run these tests, update this documentation. The orchestrator thinks. The specialists do.

The leaked coordinator code confirms Anthropic built the same thing. Their orchestrator delegates to specialists with focused tool access and scoped responsibilities. The coordinator prompt even echoes guidance I'd written independently: don't be vaguely deferential when delegating, specify exactly what to do. This isn't a coincidence. It's convergent engineering. Orchestrators need to preserve their context window for coordination.

Specialists need to burn through context freely and return only a concise summary. If the specialist pollutes the orchestrator's context, the orchestrator drowns. If the orchestrator tries to do the specialist's job, it runs out of room for strategy.

Subagents don't inherit your configuration

Here's the most important thing I learned the hard way. Custom subagents do not inherit your project configuration. I spent weeks debugging why specialists ignored conventions documented in my main configuration file. The orchestrator followed them perfectly. The specialists acted like they'd never seen them. Because they hadn't. Each specialist gets its own context window. It starts fresh.

The leaked code confirms this is by design. Each specialist prompt needs to be a complete behavioral contract: who it is, what it can do, what files it owns, what format to return results in, and when to stop. If a specialist needs your coding conventions, those conventions go in the specialist prompt itself.

This is the number one mistake in agent systems. People write beautiful CLAUDE.md files and wonder why their spawned agents ignore every rule. Now you know why.

Guardrails that actually work

Early on, I tried to enforce critical rules through prompt wording. Don't modify files outside your scope. Don't add features that weren't requested. I capitalised things. I wrote "MUST" and "NEVER" in important places.

It didn't work. Not reliably. The agent would follow the rules most of the time, then occasionally blow past them without warning. The failure rate was low enough to be dangerous.

The leaked permission engine shows Anthropic solved this the same way I did: with code, not words. Critical restrictions are enforced by a permission engine that checks before every operation, not by prompt instructions the model might or might not follow. Hooks fire deterministically. Deny rules block operations regardless of what the model wants to do.

I call this deterministic guardrails over probabilistic compliance. You cannot rely on the model remembering to do something consistently. A linter that blocks bad imports beats a coding guideline. A verification script that runs automatically beats a prompt that says "remember to run tests." Rules enforced by tools are always followed. Rules enforced by discipline are followed until something goes wrong.

The most critical rules in Claude Code are not in the system prompt. They're in the permission engine.

Graduated autonomy

I assign freedom levels to each specialist based on risk. Low freedom agents that touch critical infrastructure get exact formats, narrow file ownership, and explicit boundaries. My testing agent can read source code and run tests, but cannot edit source files. If it finds a bug, it reports it. It does not fix it. High freedom agents like the orchestrator get goals, constraints, and heuristics.

Anthropic's agent architecture maps to the same spectrum. A doc reviewer gets read access only, a test runner gets read and execute but no edit, a code modifier gets full read and write but no command execution. The leak further confirms that subagents cannot spawn other subagents, preventing scope creep and infinite nesting.

Most people give every agent the same autonomy. That's like giving every employee the same security clearance.

What surprised me

Not everything confirmed what I already knew. Some of it sent me back to rethink my own architecture.

The leaked code references KAIROS, an autonomous daemon mode with "autoDream" memory consolidation that periodically reviews, consolidates, and prunes stored memories to prevent context bloat. I had been handling memory reactively. This proactive approach is something I'm planning to adopt.

The source apparently injects fake tool definitions into system prompts to poison competitor training data. Whatever you think about the ethics, it's fascinating adversarial engineering. If you're building systems that call LLM APIs, responses you receive might contain deliberate noise designed for purposes that have nothing to do with your query.

And then there's "Undercover Mode," which instructs Claude to never reveal it's an AI when contributing to public repositories. This has generated the most controversy, understandably. But from an engineering perspective, it reveals that Anthropic treats agent identity as a configurable property, not a fixed trait.

Interesting architectural choice regardless of whether you agree with how they used it.

The real lesson

Most coverage of this leak has focused on the controversial parts. But for anyone building agent systems, the deeper story is quieter and more useful.

The patterns in Claude Code are not novel. Orchestrator plus specialists. Context isolation. Deterministic enforcement. Graduated autonomy. These are the engineering reality of building systems where AI agents need to coordinate reliably, and anyone working seriously with agent orchestration arrives at them through trial and error.

What's encouraging is that the leak lowers the barrier. Before, you had to discover these patterns through months of mistakes. Now the reference implementation is on GitHub with 30,000 stars. You can read how Anthropic's team solved the same problems you're going to face, and skip the worst of the dead ends.

The patterns aren't the hard part though. Knowing when to apply them, which tradeoffs matter for your system, and how to debug failures that don't show up in any architecture diagram — that's still something you learn by building. The leak gives you a map. You still have to walk the territory.