Most Claude Code tips are surface-level. "Write better prompts." "Be specific." "Give it context." Fine advice, but it misses the point.
The real leverage comes from understanding why Claude Code was built the way it was. So I read the source — all 80,000 lines of it, with four parallel subagents doing the exploration — and extracted the 15 design decisions the authors encoded into the TypeScript.
Here's what I found.
The Core Insight That Changes Everything
One turn in Claude Code is not a function call. It's an AsyncGenerator.
// QueryEngine.ts · line 209
export class QueryEngine {
async *submitMessage(
prompt: string | ContentBlockParam[],
options?: { uuid?: string; isMeta?: boolean },
): AsyncGenerator<SDKMessage, void, unknown>
}
That single design decision explains dozens of behaviors. Every LLM response token, every tool execution update, every file change notification — all of it flows through one stream. The queryLoop runs five levels deep with yield* chains at every stage.
Why? Because the authors had a specific belief: the longer the LLM response, the more users want to change direction mid-flight.
Traditional CLIs use complete-then-display. Interrupt and you lose everything. Claude Code is the opposite: interrupt and files already written stay on disk, half-executed tools leave their partial results intact, context up to the last yield is preserved.
The architecture around a single turn looks like this:
User Input → submitMessage
↓
queryLoop (while true)
├─ 1. Context compression
├─ 2. LLM streaming call ← yield at every token
├─ 3. Parallel tool execution ← yield per tool
├─ 4. Stop hooks
└─ 5. Continue or exit?
↓
Terminal UI (Ink render)
[Ctrl+C possible at any stage — previous yields are preserved]
The consumer pulls with for await, so cancellation, streaming, and backpressure come for free.
The "Observe First, Prompt Later" Rule
Once you understand the streaming architecture, one habit becomes obvious: stop trying to craft perfect prompts upfront.
With traditional CLIs, you had to be careful before firing. Claude Code flips that. Watch the first 5–10 seconds. If the model starts reading the wrong files, hit Ctrl+C immediately. The later you interrupt, the more you end up with a "half right, half wrong" context state that contaminates the next turn.
Don't:
"Could you read the file first, understand the structure, then carefully make changes? Keep the existing style, and if there are related files please check those too…"
Do:
Fix the validation at
auth.ts:42to RFC 5322. If the first Read hits the right file, keep going — otherwise interrupt and re-direct.
The streaming interrupt model makes mid-run correction cheap. Prompt perfectionism defeats the entire architecture.
The 4-Stage Compression System
Another thing that jumps out from the source: Claude Code has four layers of token-saving mechanisms, running in sequence from lightest to most aggressive.
| Stage | Name | What it does |
|---|---|---|
| 01 | snipCompact |
Remove stale snippets |
| 02 | microcompact |
Cached transforms, tombstone cleanup |
| 03 | contextCollapse |
Parallel summarization of read-only segments |
| 04 | autocompact |
Full LLM summarization call |
After all four stages, postCompactCleanup re-injects the 5 most recently modified files — so the context of what was just done isn't lost.
There's also a circuit breaker: after 3 consecutive autocompact failures, it throws CompactionError and flips the session to read-only. No infinite loops.
The implication for users: run /compact manually, right after finishing a feature. More predictable than auto, and you control when the compression happens.
All 15 Design Intents (Teasers)
The streaming architecture is just the first. Here's the full map of what the source reveals:
- AsyncGenerator throughout — interrupt at any point, work is preserved
-
All tool defaults are conservative —
isReadOnly,isConcurrencySafe,isDestructiveall default tofalse - Schema lazy-loading via ToolSearch — schemas are tokens too; load only what's needed
- Subagent isolation via AgentTool — protect the main context; results come back as summaries only
- memdir on the filesystem — memory in markdown you can open and edit, not a black-box database
- Compact 4 stages + 3-failure circuit breaker — no infinite loops, no silent failures
-
Hook's
updatedInput— hooks can rewrite tool inputs before execution, not just block/allow - Plan mode as a separate state — exploration and execution use different thinking modes
-
PKCE OAuth + Keychain storage — API keys in the system keychain, not
.envfiles - VCR fixture system — LLM responses cached by message hash for deterministic tests
- Ink-based React terminal UI — the terminal is treated as an app, not a pipe
- Sessions stored as NDJSON — one line per message; trim the tail to recover a broken session
- Stop hook blocking/non-blocking split — background memory extraction is fire-and-forget
- Build-time feature flag elimination — internal experiment features become dead code in prod
- Ask when permission is uncertain — slow is better than silently wrong
Each of these has direct, practical implications for how you use the tool. Not "here's a tip" — but "here's why the tool behaves the way it does, and here's how to use that."
Why This Matters
The gap between an experienced Claude Code user and a newcomer isn't prompt skill. It's whether you know what shape of input the tool was built to digest.
Match your prompts, memory, and sessions to that shape, and you get two or three times the output for the same cost. Fight it — by editing CLAUDE.md constantly, pasting massive error logs, running everything in the main thread — and you pay for it in slow responses, inflated bills, and contaminated context.
The 15 design intents are the shape.
This post is part of Claude Code: 80K Lines Dissected — a 14-chapter ebook reverse-engineered from Claude Code's TypeScript source.
Top comments (0)