A source map shipped in the v2.1.88 npm release: about 1,884 files under src/, original names and comments intact. So I walked the core modules and checked what everyone "knows" about how Claude Code works against what the code actually does.
Half of it was wrong. Including things I'd repeated myself.
I did not read it line by line. Nobody reads 1,884 files line by line. I walked the key modules and tied every claim to something concrete: a function, a constant. So you'll see names like queryLoop and AUTOCOMPACT_BUFFER_TOKENS below. Real identifiers, so every claim is checkable against the public teardowns, not vibes.
This is a map of how it works, not a dump of its guts. I don't quote internal prompts, and anything that only runs in Anthropic's internal builds is flagged as such.
Myth 1: "The agent recursively calls itself on every tool result"
The picture everyone has: model replies, tool runs, the agent calls itself again, deeper down the stack.
There's no recursion.
// src/query.ts
async function* queryLoop(state) {
while (true) {
// ...run model, run tools...
state = { ...state } // overwrite in place
continue // not a nested call
}
}
One while (true) inside an async generator. It mutates a single State object and continues. The stack never grows deeper.
Why it matters: every budget, timeout and turn limit you set is counted per loop pass, not per stack frame. "One turn" is literally "one pass." Once I stopped picturing a recursive agent and started picturing a long-running stateful loop, the same tricks I'd use on any long loop applied: count the budget per step, watch what changed between steps.
Myth 2: "When the context fills up, it just truncates"
This is the interesting one. Context isn't one "drop the old stuff" function. It's five mechanisms, ordered cheapest to most expensive:
snip -> microcompact -> context-collapse -> autocompact -> reactive
The order is deliberate. Each later stage sits after the earlier one precisely so that if a cheap stage already freed space, the expensive one does nothing. The comment says it outright: run collapse before autocompact so autocompact often never fires.
Cheap stages drop old tool results, surgically. The expensive one, autocompact, makes a separate model call to summarize the whole history. It kicks in at the effective context window minus AUTOCOMPACT_BUFFER_TOKENS, a 13,000-token reserve for the summary itself.
Here's what sent me digging. autocompact has a silent fuse:
const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3
// after 3 failed compactions in a row, autocompact
// shuts off for the rest of the session. silently.
The comment explains why it's there: sessions were hitting 50+ consecutive failures, up to 3,272 in one session, burning roughly 250,000 extra API calls a day across all users.
Translation: a session that "felt fine for hours" could have spent part of that time running on surgical drops alone, with no real compaction, and the UI would never tell you.
Myth 3: "A full compaction always keeps the last few messages verbatim"
I was sure compaction kept the most recent messages word for word and only touched older ones. For a full autocompact, no.
On a full compaction the message array is rebuilt from scratch: a boundary marker, the summary, and a few files pulled back in. messagesToKeep is empty. The verbatim tail survives only in the other modes (partial, reactive, session-memory compaction), which carry a note that recent messages are kept as-is. Full mode doesn't.
The uncomfortable part: after a full compaction the model does not remember your last couple of messages word for word. It remembers a retelling of your conversation that it wrote itself.
Myth 4: "Tools run after the model finishes talking"
No. The tool starts before the model is done with its sentence.
The moment a tool_use block shows up in the stream and you don't hit cancel in that split second, StreamingToolExecutor has already started it. The model is still typing, and the edit on disk has already happened. Parallelism is capped by CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY (default 10): what's safe to run together goes in a batch, the rest queues.
One detail I lost half an hour to. The executor has its own child abort controller. If one parallel tool fails (say Bash), it instantly kills its siblings but does not abort the turn. The killed sibling gets something like "cancelled because a neighbor call failed." So you sit there wondering why a command that depended on nothing didn't run. It depended on a neighbor.
Myth 5: "One model message is one reply, and stop_reason tells you if a tool was called"
Two facts that save hours of debugging.
First, the easy one: Claude Code sends a separate assistant message per block, not one for the whole reply. Text, thinking, tool call: each ships as its own message the moment it's finished.
Second, the nasty one. When a block closes, stop_reason is always null. The real value arrives later, as a separate event, and gets written in after the fact by editing the message that was already sent. The code is honest about it:
// stop_reason === 'tool_use' is unreliable
So the loop doesn't trust it. To decide whether a tool was called, it checks the fact: did a tool_use block arrive or not. If you've written a wrapper over a stream like this and hit races on stop_reason, now you know where they came from.
Myth 6: "Permissions are just a chain: user -> project -> local -> policy"
The layers exist, and it sounds logical: higher layer wins. I thought so too. The order isn't by source. It's by strictness.
deny beats everything, including bypassPermissions. Then, in decreasing strictness: targeted ask rules and safety checks, then the bypass itself, then allow rules, and only what nobody explicitly allowed finally reaches "ask the user." A denial always outranks a bypass.
The bigger surprise is the zones the bypass doesn't pierce at all. Even under bypassPermissions (which supposedly means allow everything, don't ask), edits to .git/, .claude/, .vscode/ and shell config files still hit a confirmation. The logic is simple: let the agent edit its own settings unprompted and it writes itself a pass out of the permission sandbox. Defaults lean the same way: until a tool declares it's read-only and safe to parallelize, the system assumes it writes and can't be parallelized.
Myth 7: "A subagent is just another Claude running next to you"
A subagent is an isolated fork, and the isolation is harder than it looks. It gets its own agentId, its own copy of the read-files list, and empty memory.
And the part that bites. A normal subagent's setAppState is empty by default, so it can't change application state. And because it can't, it's immediately handed the "don't ask" permission flag, and the rest follows on its own: a background subagent physically can't show a permission dialog, so any ask it makes silently turns into deny. Hand a subagent a task that hits a confirmation and it won't wait for you. It gets a no and drives on, as if you'd said no yourself.
One more: in the public build a subagent can't spawn subagents. Multi-level agents live only in Anthropic's internal builds. For everyone else the hierarchy is flat.
Myth 8: "Claude Code's extensions are a handful of standard hooks"
Five extension mechanisms: MCP servers, plugins, skills, hooks, and slash commands. The plugin is the odd one out, an umbrella you can stuff the other four under. Then come the numbers I was actually digging for.
There aren't five "canonical" hooks (SessionStart, PreToolUse, PostToolUse, Stop, UserPromptSubmit). There are 28, including events for teammate collaboration, tasks, working-directory changes, and file changes. And the hook contract isn't "non-zero means error." There are three outcomes:
exit 0 -> fine, proceed
exit 2 -> hard block: action cancelled, model told why
other -> soft error: stderr shown to you, session continues
That third case is why there's a slightly funny guard: before running, it separately checks the plugin folder even exists. Otherwise a hook would run python3 <missing file>.py, that would die with code 2, and one missing file would wedge Stop and UserPromptSubmit permanently. The session would never be able to end.
Skills are their own story, and one constant explains all of it:
const SKILL_BUDGET_CONTEXT_PERCENT = 0.01 // 1% of context for the whole skill list
Only the header fits in that 1% (name, description, trigger). The SKILL.md body loads only when the skill is actually called. That's why you can pile on dozens of skills and barely pay for them in context: until called, they're effectively not there.
And the small thing that kills the isolation illusion. The MCP tool "namespace" is a fancy word for a plain string prefix, mcp__server__tool. The server name and tool name are glued together, anything that isn't a letter, digit, _ or - becomes an underscore, and permissions are handed out by that glued string. There's no real isolation behind the "namespace."
One last thing, and I don't like it
I assumed the "do you trust this folder?" prompt was the very first thing Claude Code does on startup. It isn't. That prompt shows up noticeably later. By then startup has already run a good chunk of code (about a thousand lines, by the source), right next to an honest comment that security here is delicate.
The delicate bit: .claude/settings.json has already been read by that point, and it lives in the same folder you haven't trusted yet.
So settings from an untrusted folder get to influence Claude Code before you've said yes. It's not quite a hole: the most sensitive modes double-check against the trust flag. But it sits wrong with me. I'd still rather know it than not.
What I actually changed
- I count budgets and timeouts per loop pass now, not per abstract "agent call" (Myth 1).
- I stopped believing the model remembers my last messages verbatim. After a full compaction it's working from a retelling it wrote itself (Myth 3).
- I'm careful with background subagents. Since any confirmation they hit becomes a
deny, I don't hand them tasks where a prompt is even possible (Myth 7). - On compaction, I just try not to reach it. It's cheaper to clear context one extra time.
None of this makes Claude Code worse. The opposite: behind almost every oddity in the code is an incident that happened, or a guard against one. You can read it in the comments. The picture most of us carry (mine included, until last week) is just drawn at altitude.
If you build on Claude Code or the Anthropic API: which of these would have saved you a debugging session?
If you just use Claude Code day to day: which one rewrites how you'll drive it tomorrow?
And if you dug into the leaked source yourself: what did I get wrong?
Top comments (0)