I Read Claude Code's 510K Lines of Source Code — Here's How It Actually Works

#ai #claude #architecture #opensource

I spent the last few weeks reading through Claude Code's source — all 510,000 lines of TypeScript across 1,903 files. The code became available through an accidental npm source map leak, and my team and I documented our findings in a full teardown on GitHub.

Here are the five architectural decisions that stuck with me most.

1. The Entire Agent Runs From a Single 1,729-Line File

The brain of Claude Code is src/query.ts — one file, 1,729 lines, running the entire agentic loop. No state machine. No event-driven architecture. Just a while(true) loop:

while (true) {
    ① Trim context (4-layer cascade)
    ② Pre-fetch memory + skills
    ③ Call Claude API (streaming)
    ④ While receiving stream → detect tool_use blocks
       → Start executing tools IMMEDIATELY
    ⑤ Tools called? → append results → continue loop
    ⑥ No tools? → return response → exit
}

This file handles input processing, API calls, streaming parsing, tool dispatch, error recovery, and context management. It's the textbook definition of a God Object.

Why did Anthropic do this? The agentic loop is fundamentally sequential — model speaks, tools execute, model speaks again. Ninety percent of the time there are only two states: "waiting for model" and "executing tools." A state machine adds formality without adding clarity. They chose pragmatism over architecture purity and shipped.

The cost is real though. Any cross-cutting change touches everything. I'd bet the team reviews PRs to this file with extreme caution. If I were leading their next architecture review, I'd split it into three modules: a conversation orchestrator, a tool dispatcher, and a context manager. Keep the loop, but make it a thin coordination layer.

2. Four Layers of Context Management (This Is the Good Stuff)

Most AI agents handle context limits with a single strategy — summarize and truncate. Claude Code uses four mechanisms, applied in cascade:

Layer 1 — HISTORY_SNIP: Surgical deletion. Removes irrelevant messages from conversation history. Zero information loss. This is the cheapest, safest operation.

Layer 2 — Microcompact: Cache-level editing. The API tells the model to ignore certain cached tokens without actually modifying the content. The conversation stays intact; the model just stops paying attention to parts of it.

Layer 3 — CONTEXT_COLLAPSE: Structured archival. Compresses conversation segments into git-commit-log style summaries. You lose detail, but the structure survives.

Layer 4 — Autocompact: The nuclear option. Full compression of the entire context. Last resort.

The design principle: lossless before lossy, local before global.

Here's what makes this genuinely clever. Layer 1 costs nothing — you're removing "file saved successfully" messages that nobody needs. Layer 2 is a trick I hadn't seen before — it exploits the caching API to make tokens invisible without deleting them, so the cache stays warm. Only when those cheap options are exhausted do you start the expensive, destructive compression at Layers 3 and 4.

The weakness? Compression is irreversible and unauditable. After L3/L4, the model doesn't know what it forgot. It can't tell you "I may have lost context on this" — it just answers confidently based on incomplete information. That's worse than forgetting. It's not knowing that you forgot.

3. 18 Virtual Pet Species Hidden in a Coding Agent

Yes, really. Claude Code ships a full tamagotchi-style virtual pet system in production.

18 species. 5 rarity tiers (Common at 60% down to Legendary at 1%). RPG stats including DEBUGGING, PATIENCE, CHAOS, WISDOM, and SNARK. Your pet can wear hats — crown, top hat, propeller hat, wizard hat. There's a 1% chance of getting a "shiny" variant.

The species: duck, goose, blob, cat, dragon, octopus, owl, penguin, turtle, snail, ghost, axolotl, capybara, cactus, robot, rabbit, mushroom, chonk.

Every species name is hex-encoded in the source:

const duck = String.fromCharCode(0x64,0x75,0x63,0x6b)

The comment in the code says "one species name collides with a model-codename canary." So one of those 18 names is apparently the codename for Anthropic's next model. My money's on goose or axolotl, but that's pure speculation.

This probably started as a team morale project or hackathon experiment. But it ships in the binary. The feature flag system (more on that below) can remove it at compile time, so it's not a security risk per se. Still — when you run a coding agent with elevated permissions and it has an entire RPG hidden inside, you do have to wonder what else might be in tools you're running with sudo.

4. StreamingToolExecutor — Why Claude Code Feels Fast

When most agents call tools, they wait for the model to finish generating, then start executing. Claude Code doesn't wait.

The StreamingToolExecutor starts executing tools the moment they appear in the streaming response, while the model is still generating. If the model says "let me grep for that pattern" and then continues thinking about the next step, the grep is already running.

The concurrency model is a reader-writer lock:

Read-only tools (grep, file read, search) run in parallel with each other
Write tools (file write, bash with side effects) get an exclusive lock
Results buffer in receive order and get assembled once the stream ends

It's a textbook RWLock applied to tool dispatch, and it works. The perceived speed improvement is significant because file reads and searches — the most common operations — never block each other.

The subtle risk: if a tool is incorrectly marked as read-only but actually has side effects (say, a search tool that creates cache files), parallel execution could cause race conditions. Claude Code accepts this risk. The window is small and the model self-corrects on the next turn.

There's another edge case worth noting. Two read tools read different parts of the same file, but you run git pull in another terminal between the reads. The model now sees a file state that never existed atomically. Again, accepted risk — pragmatism over correctness guarantees.

5. Security Model: Real Trade-offs, Not Theater

Claude Code's security approach is interesting because of what it doesn't do as much as what it does.

On macOS, BashTool runs commands inside Apple's sandbox-exec sandbox. There's an allowlist-based permission system where users approve tool actions. Commands that block for more than 15 seconds get auto-moved to background execution.

But here's the thing: Claude Code is locked to Anthropic's API. No provider choice. The feature flag system uses bun:bundle compile-time macros to physically remove unreleased features from the binary — security researchers literally can't find code that doesn't exist. That's smart.

The trade-off: you get a polished, tightly integrated experience, but you can't use it with other models. Compare this with Goose (30+ providers, MCP-native extensions, 5-inspector pipeline) or DeerFlow (any provider via LangGraph). Claude Code chose depth over breadth and bet that being the best at one integration beats being mediocre at thirty.

The multi-agent system has a similar philosophy. Workers can't spawn sub-workers — hard ban, not a depth limit. This prevents resource explosion but limits recursive decomposition. You can't tell a worker to refactor a module and have it spin up per-file sub-workers. Safe? Yes. Flexible? Not particularly.

The Architecture Diagram

Here's how it all fits together:

The flow goes: CLI entry (Bun runtime) → Session layer (auth, config, memory) → the agentic core in query.ts (the while-true loop with the 4-layer context cascade) → tool execution (40+ tools via buildTool() factories, no inheritance) → results feed back into the loop.

What I'd Steal for My Own Agent

If I were building an agent from scratch today, three patterns from Claude Code would go straight into the design:

The 4-layer context cascade. Progressive degradation beats one-shot summarization every time. Start cheap and lossless, escalate to expensive and lossy.
Streaming tool execution with RWLock. The implementation is maybe 200 lines of code and the UX improvement is immediately noticeable.
buildTool() factories over class hierarchies. At 40 tools with minimal shared behavior, composition wins. At 100+ tools with shared concerns, you'd want lightweight per-family factories — still functions, not classes.

What I'd skip: the 1,729-line God Object. Yes, it worked for shipping v1. No, it won't age well. And the hard ban on nested workers feels like solving the "runaway agents" problem with a hammer when a budget-based approach (depth limit + global worker count) would be more flexible.

The full teardown — including the Mermaid diagrams, feature flag analysis, unreleased voice mode (codename: Amber Quartz), and a cross-project comparison with DeerFlow, Goose, and others — is on GitHub:

NeuZhou/awesome-ai-anatomy → Claude Code teardown

We've published 11 teardowns so far (Dify, DeerFlow, Goose, Lightpanda, and more), with Cursor next on the list. Star the repo if you want to see the next one drop.