A comprehensive synthesis of the
claude-code-from-sourceproject (and companion site claude-code-from-source.com) β distilled into core principles, techniques, and actionable guidelines for builders who want to ship a coding agent of comparable quality.The source repo is a 18-chapter educational reverse-engineering of Claude Code derived from npm source maps. No proprietary code is reproduced β only architectural pseudocode and design rationale. This guide does the same.
Table of Contents
- π‘ TL;DR β the whole agent in one mental picture
- π― What you are actually building
- π§± The six core abstractions
- π¦ State: two tiers, one source of truth
- βοΈ The agent loop: AsyncGenerator as control plane
- π§ Tools: self-describing, fail-closed, parameterized
- β‘ Concurrency and speculative execution
- π Permissions: modes, rules, and bubbling
- ποΈ Context engineering: the 4-layer compression pipeline
- π The API layer: prompt caching as architecture
- π€ Sub-agents and fork agents
- πΈοΈ Multi-agent coordination patterns
- π§ Memory: file-based persistence + LLM recall
- π Skills, hooks, plugins β extensibility surface
- π MCP: the universal external-tool protocol
- π Bootstrap, startup, and rendering performance
- π The 10 foundational patterns (cheat sheet)
- πΊοΈ Build-your-own: a 14-step roadmap
- β οΈ Anti-patterns and pitfalls
- π Glossary
0. π‘ TL;DR β the whole agent in one mental picture
Before the details, hold this picture in your head. Everything else is elaboration.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β query() β async generator, the only place control flows β
β β
β while not done: β
β state = compress(state) # 4 layers β
β response = await stream(model, state) β
β yield response.messages # to UI β
β if no tool_calls: return completed β
β batches = partition(response.tool_calls) β
β for batch in batches: β
β results = run(batch) # parallel-safe β
β yield results.messages β
β state += results β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β² β² β² β²
β β β β
Memory (files) Tools (self- Hooks (27 Sub-agents
loaded into describing, lifecycle (recursive
system prompt fail-closed, events) query() with
at session partitioned by isolated state)
start safety per-call)
Five rules carry 80% of the design:
- π The loop is an async generator. Backpressure, cancellation, and typed terminal states fall out for free.
- π Every tool is self-describing (schema, permissions, concurrency safety). The loop never special-cases tools.
- π‘οΈ Safety is per invocation, not per tool type.
Bash("ls")βBash("rm -rf"). - πΎ Prompt cache is architecture, not optimization. Static-then-dynamic boundary, sticky flags, byte-identical fork prefixes.
- π Memory is files. A small LLM picks which to load. No database, no embeddings. Trust through transparency.
If you only build those five things well, you have ~80% of Claude Code. The rest is layering and polish.
1. π― What you are actually building
A production coding agent is not a chat loop with tool calls bolted on. It is a streaming, cancellable, recursive state machine that has to:
- Survive token-budget exhaustion mid-task without losing the user's work.
- Run dozens of tools per turn safely, often in parallel, sometimes speculatively.
- Spawn child agents that cost ~10% of a normal call thanks to prompt cache reuse.
- Persist semantic knowledge across sessions without a database.
- Allow third parties to extend it (skills, hooks, MCP) without crashing the host.
- Boot in under 300 ms and stream the first token in well under a second.
If your design omits any of these, you will hit a wall later. Build for them on day one β most of them are cheap when planned, expensive when retrofitted.
The closing principle of the source book: push complexity to the boundaries. Protocol translation, state reconciliation, external tool invocation, permission checking β these belong at the edges. The interior (loop, memory, tool composition) stays clean and exhaustively typed.
2. π§± The six core abstractions
Every part of Claude Code reduces to one of these. Implement them as first-class modules, not as helpers attached to a god object.
| # | Abstraction | Responsibility | Approx LoC in CC |
|---|---|---|---|
| 1 | Query Loop | Async generator that streams model output, runs tools, appends results, decides when to stop. Returns a typed Terminal discriminated union (10 reasons). |
~1,700 |
| 2 | Tool System | Self-describing tools with schema, permissions, concurrency, rendering. Batched into concurrent/serial groups. Speculative execution during streaming. | β |
| 3 | Tasks | Background units following `pending β running β completed \ | failed \ |
| 4 | State | Two layers: a mutable singleton {% raw %}STATE (~80 fields, infrastructure) + a 34-line reactive store (UI: messages, approvals, progress). |
β |
| 5 | Memory | File-tier persistence (CLAUDE.md, ~/.claude/MEMORY.md, team symlinks). LLM picks relevant memories at session start. |
β |
| 6 | Hooks | Lifecycle interceptors at 27 events, in 4 forms: shell command, single-shot prompt, agent loop, HTTP webhook. | β |
Why this carving
- The Query Loop is the only place control flow lives. Tools, hooks, sub-agents β they all yield through it.
- State is split because infrastructure mutates rarely but reads constantly; UI is the opposite. One subscription model can't serve both.
- Memory is its own primitive (not a tool) because it is read on every system-prompt build, before any tool can run.
- Hooks are first-class because the permission system itself runs partially as
PreToolUsehooks. They are not an afterthought.
3. π¦ State: two tiers, one source of truth
State design is where most agent codebases collapse. Claude Code splits it into two tiers with strict layering:
| Tier | What it holds | Mutability | Reachable from |
|---|---|---|---|
Bootstrap state (STATE) |
~80 fields: originalCwd, sessionId, model overrides, cost accumulators, telemetry handles, prompt-cache allowlists |
Mutable through ~100 typed setters | Everywhere β DAG leaf, depends on nothing but Node.js stdlib |
| AppState (reactive store) | Messages, input mode, tool approvals, progress indicators, todos | Immutable snapshots; updater functions only | Inside React components |
Why split them
- Availability: session ID, telemetry, and cost trackers must exist before React mounts. A reactive store cannot serve them.
- Access pattern: bootstrap state is read constantly, mutated rarely, with no subscribers. AppState is read by render subscribers on every change. One subscription model can't serve both.
- Dependency direction: bootstrap depends on nothing β AppState imports bootstrap β React imports AppState. Enforce this with a lint rule. Cycles will sneak in otherwise.
The reactive store in 34 lines
function makeStore(initial, onTransition) {
let current = initial
const subs = new Set()
return {
read: () => current,
update: (fn) => {
const next = fn(current)
if (Object.is(next, current)) return // skip noop
const prev = current; current = next
onTransition?.(prev, next) // side effects FIRST
subs.forEach(cb => cb()) // then UI
},
subscribe: (cb) => { subs.add(cb); return () => subs.delete(cb) },
}
}
Three deliberate choices:
-
Updater-only mutations. No
set(value)API. Stale-closure bugs vanish. -
Object.isguard. Identical references skip re-renders and side effects. -
onChangefires before listeners. Side effects (e.g. persist to disk, notify remote session) complete before the UI flips.
The sticky latch pattern (write-once flags)
A pattern worth memorizing β applies any time a value influences a server-side cache key:
type Latch = boolean | null // null = "not yet evaluated"
function shouldSendBetaHeader(featureCurrentlyActive: boolean): boolean {
const latched = getAfkLatch()
if (latched === true) return true // already on β keep sending
if (featureCurrentlyActive) {
setAfkLatch(true) // first activation β latch
return true
}
return false // never activated
}
The three-state type self-documents intent: null says "we haven't decided yet." Once true, never returns to false. Five such latches in Claude Code prevent mid-session feature toggles from busting 50β70K tokens of cached prompt.
Centralizing side effects on diffs
A real production bug: permission mode was synced to the remote session by 2 of 8+ mutation paths. Eventually one drifted. The fix was a single onChangeAppState(prev, next) callback that detects field changes structurally β every mutation path is automatically covered. Side effects scale much more slowly than mutation sites; centralize on diffs, not events.
Cost tracking (a concrete example)
Every API response runs through addToTotalSessionCost:
- Accumulates per-model usage in bootstrap state.
- Reports to OpenTelemetry.
- Recursively processes nested model calls (sub-agents, recall queries).
- Persists to project config on process exit.
- Restores on next session only if the persisted session ID matches.
Histograms use reservoir sampling (Algorithm R) with 1,024 entries to compute p50/p95/p99. Averages hide tail latency, and tail latency is what users feel.
Actionable: even in v0, instrument cost and latency. You cannot decide what to optimize from feel.
4. βοΈ The agent loop: AsyncGenerator as control plane
The loop is an async function* β not a while with callbacks, not an event emitter, not an RxJS pipeline. There are three concrete reasons to choose generators:
-
Backpressure for free. A generator yields only when the consumer calls
.next(). The REPL pulls viafor await, naturally pausing if the UI can't render fast enough. -
Typed terminal states. The generator's
returnis a discriminated union of why execution stopped:completed,max_turns,error,aborted_streaming,aborted_tools,prompt_too_long,image_error,model_error,stop_hook_prevented,hook_stopped,blocking_limit. The compiler enforces exhaustive handling. -
Composability. Inner generators delegate via
yield*. No callback nesting, no promise plumbing.
Loop skeleton
async function* query(initialState):
state = initialState
while true:
state = compress(state) // 4-layer pipeline (Β§8)
response = await callModel(state) // streaming
yield* response.messages // surface to UI
if response.error and recoverable:
state = recover(state, error)
continue
if response.error and not recoverable:
return { kind: 'model_error', error }
if not response.toolCalls:
if stopHookBlocks(state):
state = applyHookFeedback(state)
continue
return { kind: 'completed' }
batches = partitionToolCalls(response.toolCalls)
for batch in batches:
results = await executeBatch(batch, state)
yield* results.messages
state = appendToolResults(state, results)
// re-enter with new state
Continue states (don't return, just continue)
collapse_drain_retry, reactive_compact_retry, max_output_tokens_escalate, max_output_tokens_recovery, stop_hook_blocking, token_budget_continuation, next_turn. Naming each one is what makes the loop testable β every test asserts which transition fired.
Error recovery is a ladder, not a fallback
Order matters. From least to most aggressive:
| Trigger | Step 1 | Step 2 | Step 3 |
|---|---|---|---|
prompt_too_long (413) |
drain staged collapse summaries | reactive compact | surface to user |
max_output_tokens |
escalate cap 8K β 64K | multi-turn recovery (β€3 attempts) | surface |
media_size_error |
reactive compact | β | surface |
Guards prevent infinite loops: hasAttemptedReactiveCompact one-shot flags, hard caps on recovery attempts, circuit breakers. Never run stop hooks on an error response β that creates "error β hook blocks β retry β error" spirals.
Cancellation
Aborts can hit during streaming or during tool execution. In both cases, the executor must drain remaining requests by emitting synthetic tool_result blocks for queued/running tools. The Anthropic API rejects an assistant message containing a tool_use block without a matching tool_result. signal.reason distinguishes hard aborts from "submit interrupts" (a new user message), so you skip redundant interruption stubs in the latter case.
Actionable: every
tool_useyour agent emits must have a pairedtool_resultin message history before the next API call. Make this an invariant your loop enforces, not a hope.
5. π§ Tools: self-describing, fail-closed, parameterized
Interface
A tool is parameterized by three types: Input, Output, and Progress. The Input doubles as a Zod schema and the JSON Schema given to the model.
The full Tool interface in Claude Code has ~45 members. Five are critical:
-
call(input, ctx)β runs the work. -
inputSchemaβ Zod schema (validated, plus auto-generated JSON Schema). -
isConcurrencySafe(parsedInput)β per invocation, not per type. -
checkPermissions(parsedInput, ctx)β returnsallow | deny | ask | passthroughwith optionalupdatedInput. -
validateInput(parsedInput, ctx)β semantic checks beyond schema (e.g. reject no-op edits).
The buildTool() factory pattern (fail-closed)
Never construct a tool literal directly. Wrap it in a factory that fills in dangerous defaults conservatively:
const SAFE_DEFAULTS = {
isEnabled: () => true,
isParallelSafe: () => false, // serial unless proven otherwise
isReadOnly: () => false, // assume writes
isDestructive: () => false,
checkPermissions: (input) => ({ behavior: 'allow', updatedInput: input }),
}
If a tool author forgets isConcurrencySafe, they get serial execution β slow, but never corrupting. The opposite default would silently produce race conditions.
Tool result shape
type ToolResult<T> = {
data: T
newMessages?: Message[] // e.g. AgentTool injects sub-agent transcript
contextModifier?: (ctx: ToolUseContext) => ToolUseContext // e.g. EnterPlanMode
}
Context modifiers only apply to serial tools. Concurrent tools queue modifiers until the batch completes β otherwise data dependencies and shared state become race-condition territory.
The 14-step execution pipeline (checkPermissionsAndCallTool())
This is the choreography every tool call goes through. Implement it as a single function that returns a ToolResult or ToolError. Skipping any of these steps will hurt later.
| # | Step | Why it matters |
|---|---|---|
| 1 | Tool lookup (with alias map) | Old transcripts may reference renamed tools |
| 2 | Abort check | Don't waste compute on cancelled queued calls |
| 3 | Zod validation | Catch type errors; hint to call ToolSearch for deferred tools |
| 4 | Semantic validation | E.g. reject no-op edits, block sleep if a Monitor tool exists |
| 5 | Speculative classifier start | Fire auto-mode permission classifier in parallel for Bash |
| 6 | Input backfill | Expand ~/foo β absolute paths for hooks/permissions but keep originals for transcript stability |
| 7 |
PreToolUse hooks |
Hooks decide / modify / block |
| 8 | Permission resolution | Rule match β tool method β mode default β prompt β classifier |
| 9 | Permission denied path | Build error, fire PermissionDenied hook |
| 10 | Execute call()
|
The actual work |
| 11 | Result budgeting | Persist oversized output to disk; replace with preview |
| 12 |
PostToolUse hooks |
Modify MCP output, possibly block continuation |
| 13 | Append newMessages
|
Sub-agent transcripts, system reminders |
| 14 | Error classification | Telemetry, OTel events |
Result budgeting
Per-tool size caps prevent runaway output:
| Tool | maxResultSizeChars |
Rationale |
|---|---|---|
| Bash | 30,000 | Most useful output fits |
| Edit | 100,000 | Diffs need room |
| Grep | 100,000 | Search results accumulate |
| Read | β | Self-bounded by token limit; persisting would create circular Read loops |
Above the cap, the system writes the full content to a <persisted-output> file and returns a preview pointing to it. An aggregate ContentReplacementState tracks per-conversation budgets so multiple near-cap results cannot blow context together.
Deferred loading
Tools marked shouldDefer: true send only { name, description, defer_loading: true } to the API. The model has to call ToolSearch to load full schemas. Three benefits:
- Smaller initial prompt.
- Adding/removing a deferred tool changes the prompt by a few tokens, not hundreds β prompt cache stays warm.
- Less tool-soup confusion for the model.
Tool registry assembly order matters
final = sort(builtins, alpha) ++ sort(mcpTools, alpha)
Sort within each partition, then concatenate. A flat sort across all tools would interleave MCP tools into built-in positions, busting cache breakpoints whenever MCP servers are added/removed.
6. β‘ Concurrency and speculative execution
The core insight
Safety is determined per-invocation, not per-tool-type.
Bash("ls -la")is concurrency-safe.Bash("rm -rf build/")is not. Same tool. Different inputs. Different verdict.
The partition algorithm
partitionToolCalls(calls):
batches = []
current = { kind: 'concurrent', tools: [] }
for call in calls:
tool = lookup(call.name)
parsed = tool.inputSchema.safeParse(call.input)
safe = parsed.success and tool.isConcurrencySafe(parsed.data)
if safe and current.kind == 'concurrent':
current.tools.push(call)
else if safe:
batches.push(current); current = { kind: 'concurrent', tools: [call] }
else:
if current.tools: batches.push(current)
batches.push({ kind: 'serial', tools: [call] })
current = { kind: 'concurrent', tools: [] }
if current.tools: batches.push(current)
return batches
Example: [Read, Read, Grep, Edit, Read] β [concurrent[Read, Read, Grep], serial[Edit], concurrent[Read]].
Parsing failure β serial. Safety-check exception β serial. Always fail closed.
Speculative streaming execution
The StreamingToolExecutor watches the model stream. The moment a tool_use block is fully parsed (often seconds before the response finishes), it starts that tool β provided admission rules allow.
Admission rule: a tool can start executing iff no tool is currently running, or both the new tool and all currently-running tools are concurrency-safe.
Sequential timeline: stream 2.5s + 3 serial tools = 3.1s
Speculative: stream 2.5s overlapped with tools 1β2; total 2.6s
Tool states: Queued β Executing β Completed β Yielded. Yield in submission order, not completion order β even if c.ts finishes before a.ts, the conversation history must remain a, b, c.
Error cascade policy
-
Bash errors cascade within a batch. Shell commands form implicit pipelines; running
cpafter a failingmkdiris pointless. - Read/Grep errors isolate. One file read failure has no bearing on a sibling grep.
Cancelled siblings get synthetic results: "Cancelled: parallel tool call Bash(mkdir build) errored".
Interrupt behavior
Each tool declares interruptBehavior(): 'cancel' | 'block'. The executor treats an executing batch as interruptible only when all tools in it support cancel. A single block tool blocks user Esc for the whole batch.
7. π Permissions: modes, rules, and bubbling
Seven modes (most β least permissive)
| Mode | Behavior |
|---|---|
bypassPermissions |
No checks (testing only) |
dontAsk |
Auto-deny prompts (background agents β never block on user input) |
auto |
Lightweight LLM classifier evaluates each call against transcript |
acceptEdits |
File edits auto-allowed; other mutations prompt |
default |
Standard interactive β user approves each action |
plan |
Read-only; all writes denied |
bubble |
Sub-agent escalates the decision to its parent |
Sub-agents default to bubble. Background agents default to dontAsk (they can't block on a prompt that has no UI).
Resolution chain
1. Hook decision? β final
2. allowedRules / deniedRules / askRules match? β final
3. tool.checkPermissions() β allow | deny | ask | passthrough
4. Mode default
5. (interactive only) prompt user
6. (auto only) classifier
Rules
Three pieces: source (tracks provenance), ruleBehavior (allow/deny/ask), ruleValue (with optional content patterns).
-
Bash(git *)β Bash commands starting withgit -
Edit(/src/**)β file edits restricted to/src -
Fetch(domain:example.com)β HTTP fetches limited to that domain
For Bash, parse the command via a real bash AST parser (parseForSecurity()), split on && || ; |, and classify each subcommand. If the parser fails, return fail-safe behavior β assume any command it can't parse is unsafe.
8. ποΈ Context engineering: the 4-layer compression pipeline
Run before every API call, in this strict order:
| Layer | What it does | Cost |
|---|---|---|
| 0. Tool result budget | Enforce per-message size caps; exempt tools without finite maxResultSizeChars
|
Trivial |
| 1. Snip compact | Physically remove old messages; emit UI boundary marker; report tokens freed | Cheap |
| 2. Microcompact | Drop tool results by tool_use_id once unneeded; cache edits via deferred boundary messages |
Cheap |
| 3. Context collapse | Replace conversation spans with summaries (granular) | Medium |
| 4. Auto-compact | Fork an entire Claude conversation to summarize history; circuit-break after 3 consecutive failures | Heavy |
Why ordering matters: if collapse alone gets tokens below the auto-compact threshold, auto-compact never runs β so you keep fine-grained recent history.
Budget thresholds
- Auto-compact triggers at
effectiveContextWindow β 13,000tokens. - Hard blocking limit at
effectiveContextWindow β 3,000. - 10K-token gap between them is where reactive compact runs if proactive failed.
Token counting blends authoritative API usage numbers with rough estimates for messages added since the last response β biased conservative so compaction fires slightly early.
Actionable: instrument both estimated and authoritative token counts, log the delta. When the delta drifts, your estimator is broken and your safety margins are wrong.
9. π The API layer: prompt caching as architecture
Prompt caching is not an optimization. It is an architectural constraint. Every design decision either preserves cache hits or busts them.
Multi-provider abstraction
A single getAnthropicClient() factory dispatches to one of:
- Direct API (key or OAuth)
- AWS Bedrock
- Google Vertex AI
- Azure Foundry
Provider chosen at boot from env vars + config. Stored in bootstrap state; never re-checked. SDKs dynamically imported (don't load Bedrock if you're on direct API).
A buildFetch wrapper injects an x-client-request-id UUID header on every request, so you can correlate client-side timeouts with server-side logs.
Cache scopes
| Scope | Where | TTL |
|---|---|---|
| Global | Static prompt prefix shared across all users | Long |
| 1-hour | Eligible users' extended cache | 60 min |
| Ephemeral (default) | Per-session | ~5 min |
The system prompt has a literal === DYNAMIC BOUNDARY === marker:
- Above (cacheScope: global): identity, system rules, task guidance, tool usage instructions, tone/style.
- Below (per-session): session guidance, CLAUDE.md, env info, language, MCP instructions (uncached, marked dangerous), output style.
Rule: every runtime
ifabove the boundary doubles the cache key space. 3 conditionals = 8 prefixes. 5 = 32. Compile-time feature flags are fine; runtime checks must live below the boundary.
Global scope is disabled when MCP tools are present β user-specific tool definitions would fragment the global cache into millions of unique prefixes.
Sticky latches
Five session-scoped boolean flags that, once set, cannot be unset for the rest of the session. They control beta/feature headers. Reason: "mid-session toggles don't change the server-side cache key" β flipping a flag would bust 50β70K tokens of cached context.
Pattern:
Once(value)β a setter that throws or no-ops on second call. Use this for any cache-influencing config.
Output token slot reservation
Production p99 output = 4,911 tokens. Default SDK reservation = 32Kβ64K. Over-reservation = 8β16Γ.
Strategy: cap default max_tokens at 8K. On the rare truncation (<1% of requests), retry with 64K. Recovers 12β28% of the context window for free.
Streaming: skip the SDK helper
The SDK's BetaMessageStream calls partialParse() on every input_json_delta β repeatedly re-parsing growing JSON from scratch (O(nΒ²)). Use raw Stream<BetaRawMessageStreamEvent> and accumulate tool-input strings yourself.
Watchdog and fallback
-
Idle watchdog:
setTimeout(90s)reset on every chunk. At 45s, warn. At 90s, abort and retry non-streaming. - Non-streaming fallback activates when streaming dies mid-response (network, stall, truncation, proxies returning 200 with non-SSE bodies).
- Disable fallback when streaming tool execution is active β duplicate tool runs would corrupt state.
10. π€ Sub-agents and fork agents
Single-agent capability has a hard ceiling. The fix is recursive: spawn child agents that are the same loop with isolated state.
AgentTool input schema (dynamic)
| Field | Purpose |
|---|---|
description |
3β5 word task summary |
prompt |
Full instructions |
subagent_type |
Specialization key (optional) |
model |
Override (haiku/sonnet/opus) |
run_in_background |
Async execution |
name |
For team addressability |
isolation |
worktree (filesystem clone) or remote
|
Critical pattern: feature-gate the schema itself. "The model never sees fields it cannot use." Don't tell the model "don't use
namehere" β removenamefrom the schema in this context. The model cannot misuse what it cannot see.
Output (discriminated union)
- Sync:
{ status: 'completed', prompt, ...result } - Async:
{ status: 'async_launched', agentId, outputFile }βoutputFileis a filesystem path that fills in when the bg agent completes; parents poll independently of process state.
The 15-step lifecycle (runAgent())
- Model resolution β caller override > agent definition > parent model > default. Read-only agents default to Haiku.
-
Agent ID β
agent-<hex>. Override path supports resuming a backgrounded agent. -
Context preparation β fork agents clone parent history (after
filterIncompleteToolCalls()); fresh agents start empty. -
CLAUDE.md stripping β read-only agents (Explore, Plan) omit project instructions. Saves ~10.2% of fleet
cache_creationtokens. -
Permission isolation β per-agent
getAppState()overlay. Permissive parent modes (bypass,acceptEdits) always win. - Tool resolution β fork agents reuse parent's exact array byte-for-byte; normal agents apply allow/deny lists. General-purpose agents cannot spawn sub-agents (prevents exponential fan-out).
-
System prompt β fork agents inherit pre-rendered bytes; normal agents call
agentDef.getSystemPrompt(ctx). - Abort controller β sync agents share parent's controller (Esc kills both). Async agents get an independent one (survive parent abort).
- Hook registration β agent-id-scoped, auto-cleanup on termination.
- Skill preloading β declared in frontmatter, loaded concurrently to mask latency, prepended as a user message.
- MCP initialization β inline servers (cleaned on termination) or shared configs (memoized, persistent). Must complete before context creation so tools are in the pool when snapshotted.
-
Context creation β
createSubagentContext()makes isolation decisions:Aspect Sync Async setAppStateshared isolated setAppStateForTasksshared shared readFileStateown cache own cache abortControllerparent's independent Cache-safe params callback β for bg agents; lets the summarization service fork the conversation with cache-identical prefix.
Query loop β same
query()function. Yields back to caller, records to sidechain JSONL transcript, forwards metrics.Cleanup (
finally) β MCP cleanup, hook clear, agent tracking, file cache, message GC, kill orphan shell tasks, remove agent's todos.
Fork agents: cache-driven subprocess design
The point of a fork is byte-identical request prefix to the parent, so children pay 10% input-token cost.
Three mechanisms make this work:
-
System prompt threading β pass parent's already-rendered bytes via
override.systemPrompt. Don't regenerate; feature flags or session date may have changed. -
Exact tool passthrough β
useExactTools: true. No filtering, no reordering, no re-serialization. Even forbidden tools (likeAgentToolitself) stay in the array β runtime guards prevent misuse. -
Placeholder tool results β
buildForkedMessages()clones the parent's last assistant message. For eachtool_use, it inserts a constant placeholder string"Fork started -- processing in background". Same string for every child β same bytes.
Resulting structure: [...shared_history, assistant(all_tool_uses), user(placeholders..., directive)].
Only the final directive differs across children. With a 48,500-token shared prefix and 5 children, savings exceed 90% on input tokens for children 2β5.
When fork is disabled
- Coordinator mode β coordinators have a structured-delegation prompt children would inappropriately inherit.
-
Non-interactive β fork uses
permissionMode: 'bubble', which needs a user-facing prompt. -
Explicit
subagent_typeβ the user picked Explore/Plan/etc, so fork yields.
Recursive fork prevention (defense in depth)
- Primary: child's
context.options.querySource = 'agent:builtin:fork'. AgentTool checks this before allowing fork. - Fallback: scan message history for the boilerplate XML tag if
querySourcewas lost in transit.
Six built-in agent archetypes
| Archetype | Model | Tools | Notable |
|---|---|---|---|
| General-purpose | Default | All except Agent
|
Workhorse |
| Explore | Haiku | Read-only | Omits CLAUDE.md, one-shot prompt (saves 135 chars/invocation) |
| Plan | Inherit | Read-only | 4-step process, must end with "Critical Files" list |
| Verification | Inherit | Read-only, async | System prompt explicitly anti-rationalization; requires adversarial probe |
| Claude Code Guide | Haiku |
dontAsk mode |
Doc fetcher; system prompt injects user's configured skills/agents/MCP |
| Statusline Setup | Sonnet | Read + Edit only | Narrowly-scoped specialist |
Frontmatter format for user-defined agents
---
description: "When to use this"
tools: [Read, Bash]
disallowedTools: [FileWrite]
model: haiku
permissionMode: dontAsk
maxTurns: 50
skills: [my-skill]
mcpServers: [slack, {my-server: {command: node, args: [./server.js]}}]
hooks:
PreToolUse:
- command: "echo validating"
---
# System prompt body in markdown...
Trust hierarchy (least to most trusted): user agents < plugin agents < policy agents < built-in. User-agent hooks/MCP are silently skipped under strictPluginOnlyCustomization β graceful degradation, not error.
11. πΈοΈ Multi-agent coordination patterns
Three distinct shapes:
A. Simple background delegation
Fire-and-forget. Tests, searches, lints. No coordination protocol.
B. Coordinator mode
Hierarchical manager-worker. The coordinator gets only three tools: Agent (spawn), SendMessage (talk), TaskStop (kill). That's it. By design.
"The coordinator's job is to think, plan, decompose, and synthesize. Workers do the work."
Critical principle: never delegate understanding. Coordinators must give workers exact file paths, exact line numbers, exact change descriptions β not "based on the research, fix the bug."
Workflow phases:
- Research β multiple workers explore in parallel
- Synthesis β coordinator (not workers) integrates findings
- Implementation β workers receive precise instructions
- Verification β workers validate
C. Swarm teams
Peer-to-peer. Same process, isolated via AsyncLocalStorage, file-based mailboxes. Each message has metadata (sender, timestamp, color for UI).
Three interruption levels:
-
Abort current workβ cancel turn, keep operating -
Shutdown requestβ cooperative graceful wind-down -
Killβ hard abort via controller
Task state machine (universal)
All background work β bash, sub-agents, remote sessions, teammates, dreams β flows through one state model:
pending β running β { completed | failed | killed }
Seven task types with single-char visual prefixes: local_bash (b), local_agent (a), remote_agent (r), in_process_teammate (t), local_workflow (w), monitor_mcp (m), dream (d).
SendMessage dispatch order
-
Bridge (
bridge:<session-id>) β cross-machine via Remote Control relays -
UDS (
uds:<socket-path>) β local IPC via Unix Domain Sockets - In-process β agent IDs / names of running agents
- Team mailbox β file-based queue
Killer feature: transparent agent resumption. Sending a message to a "dead" agent automatically resurrects it from its disk transcript. The conversation simply continues.
Command queue invariant
Messages are delivered between tool rounds, never mid-execution. The agent finishes the current turn, then receives new info. No race conditions, no corrupted state. Make this a hard rule β it's the cheapest way to get correctness in multi-agent comms.
Pattern selection
| Scenario | Pattern |
|---|---|
| Single bg task | Delegation |
| Multi-file refactor with research phase | Coordinator |
| Long-running collaborative dev | Swarm |
Operational guardrail
A 50-message memory cap on in-process teammates exists because a real production incident reached 36.8 GB across 292 agents. Plan for unbounded fan-out from day one or it will hurt you.
12. π§ Memory: file-based persistence + LLM recall
Why files, not a database
-
Transparency β users open
.mdfiles and see exactly what the agent remembers. Trust through observability, not capability. - Modification time is a built-in epistemological signal: "when was this observation recorded?"
- Zero infrastructure β no schema migrations, no indexes, no backups.
Layout
~/.claude/projects/<sanitized-git-root>/memory/
MEMORY.md # always loaded; index only; β€200 lines, β€25 KB
user_role.md # one memory per file
feedback_testing.md
project_migration_q2.md
team/ # shared via symlink
logs/YYYY/MM/YYYY-MM-DD.md # KAIROS append-only mode
Four-type taxonomy
| Type | Purpose |
|---|---|
| user | Role, expertise, preferences |
| feedback | Corrections + validated approaches (lead with rule, then Why: and How to apply: lines) |
| project | Active work context with absolute dates (always convert "Thursday" β 2026-03-05) |
| reference | Pointers to external systems (Linear, Slack channels) |
Derivability test: if git log / git blame / the code itself can answer it, don't memorize it. No code patterns, no architecture, no debug fix recipes.
Frontmatter contract
---
name: <title>
description: <one-line summary used by recall LLM>
type: user | feedback | project | reference
---
<body β for feedback/project, structure as: rule β **Why:** β **How to apply:**>
The description field carries the most weight β it's the LLM-recall index.
Two-tier retrieval
-
Tier 1 (always loaded):
MEMORY.mdindex (~3,000 tokens for ~150 entries). Lines after 200 are truncated. - Tier 2 (on-demand): an async Sonnet side-query gets the manifest (type, name, date, description), the user's current query, and recent tool history. Returns up to 5 filenames as structured JSON. Validated against the file list to catch hallucination.
This trades a few hundred ms of latency for semantic precision keyword-matching cannot achieve β especially for negation (do NOT use mocks).
Staleness policy
Don't expire. Annotate. Today/yesterday β no caveat. Older β human-readable warning ("This memory is 47 days old β code claims may be outdated"). Models reason better about "47 days ago" than ISO timestamps.
Write path (two-step)
- Write
<type>_<topic>.mdwith frontmatter + body. - Add a one-line pointer to
MEMORY.md:- [Title](file.md) β one-line hook.
A background extraction agent runs at loop completion to catch memories the main agent missed.
KAIROS continuous mode
For long-lived sessions, replace two-step writes with append-only daily logs in logs/YYYY/MM/. A separate consolidation pass (after 24h or 5+ modified sessions) merges logs into structured memories.
Security (team paths)
Three-layer validation, all fail-closed:
- Input sanitization (null bytes, traversal sequences, Unicode attacks)
- String-level path validation with trailing-separator checks
- Symlink resolution against the deepest existing ancestor
No partial-success fallbacks. Reject early, reject completely.
13. π Skills, hooks, plugins β extensibility surface
Skills: two-phase loading
The killer pattern. 50 skills shouldn't cost 50 docs of system-prompt tokens at startup.
-
Phase 1 (startup): parse YAML frontmatter only β
name,description,when_to_use. Inject into system prompt as a directory. -
Phase 2 (invocation): load full markdown body, substitute
$ARGUMENTSand${CLAUDE_SESSION_ID}, execute inline shell commands, prepend as a user message.
You pay the token cost only when the skill actually runs.
Skill source priority (highest β lowest)
- Managed (policy / enterprise)
- User (
~/.claude/skills/) - Project (
.claude/skills/) -
--add-dirflag - Legacy commands
- Bundled
- MCP (remote, untrusted)
Hard security boundary: MCP skills never execute inline shell commands. External MCP servers are content-only. No exceptions.
Frontmatter controls
name: my-skill
description: ...
when_to_use: ...
disable-model-invocation: false # block autonomous use
context: fork # run as sub-agent with own token budget
paths: ["src/**/*.ts"] # conditional activation
hooks:
PreToolUse: [...]
Hooks: 27 events, 6 types
User-configurable:
-
Commandβ spawn shell process, read stdout/exit code -
Promptβ lightweight LLM call -
Agentβ multi-turn loop (max 50 turns) -
HTTPβ POST to remote policy server
Internal:
-
Callbackβ programmatically registered -
Functionβ session-scoped TypeScript
Top 5 lifecycle points to know:
| Hook | Fires | Can do |
|---|---|---|
PreToolUse |
Before tool execution | Block / modify / approve / inject context |
PostToolUse |
After successful execution | Inject feedback, replace MCP output |
Stop |
Before Claude concludes | Force continuation (verification loops) |
SessionStart |
Session begin | Cannot block |
UserPromptSubmit |
User submits | Block (input validation) |
Other events span tool lifecycle (PostToolUseFailure, PermissionDenied, PermissionRequest), session (SessionEnd, Setup), subagents (SubagentStart, SubagentStop), compaction (PreCompact, PostCompact), notifications, configuration, file watching, task tracking β 27 in total.
Snapshot security model
captureHooksConfigSnapshot() freezes hook config at startup. If malicious code modifies .claude/settings.json mid-session, the snapshot prevents the change from taking effect. Only the /hooks command or the file watcher can update the live config.
Policy cascade: enterprise hooks cannot be disabled by users; allowManagedHooksOnly restricts to policy-approved hooks.
Exit code semantics (command hooks)
| Code | Meaning |
|---|---|
| 0 | success |
| 2 | blocking error (deliberately uncommon to prevent accidental enforcement) |
| other | non-blocking warning |
Skill β hook integration
When a skill is invoked, its frontmatter hooks register as session-scoped. The skill directory becomes CLAUDE_PLUGIN_ROOT for those hook commands. once: true removes the hook after first execution. For sub-agents, Stop hooks auto-convert to SubagentStop to fire at the correct lifecycle point.
14. π MCP: the universal external-tool protocol
Skills and hooks extend the agent in-process. MCP (Model Context Protocol) is the standard way third parties extend it out-of-process β across servers, vendors, and trust boundaries. If you want a tool ecosystem you don't control, this is the layer that makes it possible.
Eight transports, three deployment shapes
| Shape | Transport | Use |
|---|---|---|
| Local process |
stdio (default) |
Subprocess; JSON-RPC over stdin/stdout; no auth |
| Remote server | http |
Streamable HTTP; POST + optional SSE |
sse |
Legacy (pre-2025) | |
ws |
WebSocket bidirectional | |
claudeai-proxy |
Routed via Claude.ai infrastructure | |
| In-process | sdk |
Control messages over stdin/stdout |
InProcessTransport |
Direct function calls via queueMicrotask() (63 lines) |
|
| IDE |
sse-ide, ws-ide
|
Runtime-specific |
Recommendation: start with
stdiofor local tools. Move tohttponly when you need remote. UseInProcessTransportfor tools you control end-to-end β eliminates subprocess overhead.
Tool wrapping (4 stages)
External MCP tools must merge into the same Tool interface as built-ins. Four transformations:
-
Name normalization β
mcp__{server}__{tool}. Invalid characters become underscores. Match^[a-zA-Z0-9_-]{1,64}$. - Description truncation at 2,048 chars. (Real-world: OpenAPI servers were dumping 15β60 KB descriptions.)
- Schema passthrough. Pass MCP input schemas straight through; do not transform.
-
Annotation mapping.
readOnlyHint: trueβ enables concurrent execution.destructiveHint: trueβ triggers stricter permission checks.
After wrapping, MCP tools are indistinguishable from built-ins at the loop level. The same 14-step execution pipeline runs.
Configuration scopes (7 sources, content-deduplicated)
| Scope | Source | Trust |
|---|---|---|
local |
.mcp.json in project |
User approval required |
user |
~/.claude.json |
User-managed |
project |
Project-level | Shared |
enterprise |
Org-managed | Pre-approved |
managed |
Plugin-provided | Auto-discovered |
claudeai |
Web interface | Pre-authorized |
dynamic |
SDK injection | Programmatic |
Servers with matching command/args (or URLs) are deduplicated by content, not by name. Two configs naming the same binary differently still merge.
OAuth (RFC 9728 + RFC 8414)
Discovery chain when a server returns 401:
- Probe
/.well-known/oauth-protected-resourcefor authorization-server metadata. - Fall back to RFC 8414 discovery against the MCP server itself.
- Use configured
authServerMetadataUrlas escape hatch.
Cross-App Access (XAA) enables federated token exchange via identity providers. Real-world spec violations are common β normalizeOAuthErrorBody() rewrites Slack's "200 with error body" responses to a proper HTTP 400. Plan for spec drift on day one.
Server lifecycle
-
States:
connected,failed,needs-auth(15-min TTL cache),pending,disabled. - Spawn batching: local in batches of 3, remote in batches of 20 β protects against file-descriptor exhaustion.
-
Session-expiry detection: Streamable HTTP returns
404+ JSON-RPC code-32001β reconnect + single retry.
Timeout layers
| Layer | Duration | Why |
|---|---|---|
| Connection | 30 s | Unreachable / slow servers |
| Per-request | 60 s | Fresh AbortSignal per request |
| Tool call | ~27.8 h | Legitimate long-running operations |
| Auth | 30 s | Unreachable OAuth servers |
Trap: if you reuse a single
AbortSignalacross requests it expires during idle periods.wrapFetchWithTimeout()creates a fresh signal per request. Memorize this.
Critical security rule
MCP skills never execute inline shell commands. External servers are content-only. Every other extension surface (user skills, project skills) can run shell; MCP cannot. This is the single most important MCP rule and the one you will be tempted to break.
InProcessTransport in 63 lines
Two key mechanics:
-
send()delivers viaqueueMicrotask()β prevents stack-depth blow-ups on synchronous request/response cycles. -
close()cascades to peer transport β no half-open connection states.
If you are wrapping an internal service as an MCP server, this is your reference. Don't subprocess what you can call directly.
15. π Bootstrap, startup, and rendering performance
The 5-phase pipeline (target: < 300 ms)
| Phase | File | What happens |
|---|---|---|
| 0. Fast-path dispatch | cli.tsx |
Inspect args. --version / --help β dynamic-import only that handler, exit. Don't load React, telemetry, MCP. |
| 1. Module-level I/O | main.tsx |
Side-effect-fire MDM (security policy) + keychain subprocesses during import evaluation. ~138 ms of module loading runs in parallel with subprocess I/O. |
| 2. Parse and trust | init.ts |
Parse args, load config. Enforce a trust boundary dialog. Before: only safe ops (TLS, themes, telemetry). After: env vars and git commands. |
| 3. Setup | setup.ts |
Register everything in parallel: commands, agents, hooks, plugins, MCP. Hook config snapshot frozen here. |
| 4. Launch | replLauncher.ts |
Seven entry paths converge: REPL, print, SDK, resume, continue, pipe, headless. All call the same query() loop. |
Other startup techniques
-
API preconnection β fire a
HEADto the Anthropic API during init. TCP+TLS handshake (100β200 ms) overlaps with setup. Connection is warm by the time the user submits. - Dynamic import for heavy libs β OpenTelemetry, provider SDKs, React for non-REPL paths.
- 50+ profiling checkpoints sampled at 100% of internal users / 0.5% of external. Without instrumentation you can't tell what to optimize.
Search performance (270K+ paths)
Three layers:
-
Bitmap pre-filter β assign each path a 26-bit mask of contained lowercase letters. Reject query: one integer comparison
(charBits[i] & needleBitmap) !== needleBitmap. Rejects 10β90% at 4 bytes/entry. - Score-bound rejection β skip paths that can't beat the current top score before expensive scoring.
- Async indexing with partial queryability β yield every ~4 ms. Search begins within 5β10 ms of index availability.
Rendering: patterns that transfer beyond the terminal
Claude Code forks Ink because stock Ink allocates one JS object per cell per frame β at 200Γ120 that's 24,000 GC'd objects every 16 ms. Whatever you're rendering, the lessons transfer:
-
Double-buffer + atomic write. Two persistent
Frameobjects; render into the back, swap pointers (no allocation), write the diff in one syscall wrapped in BSU/ESU (Begin/End Synchronized Update). No tearing. - Cell-level diffing with damage rectangles. Compute the bounding box of writes; diff only inside it. ~6Γ reduction in compare work for localized updates.
- Three interning pools (chars, styles, hyperlinks) β integer IDs everywhere. Style transitions become a single pre-cached string lookup. Pools generationally reset every 5 min.
-
Frame throttling. 60 fps focused, 30 fps blurred (
throttle(deferredRender, FRAME_INTERVAL_MS)). Scroll events get a tighter 4 ms schedule. -
Pack related data. Two
Int32words per cell beats scattered objects β better cache behavior, faster compare, fewer allocations. - Lazy expensive work. Syntax highlighting via React Suspense β code shows unstyled first, colors paint moments later.
- Separate hot paths from React. Direct DOM mutation + microtask scheduling for scroll. React handles the final paint, where it's already efficient.
The thesis: performance is not making operations fast; it is eliminating operations entirely.
16. π The 10 foundational patterns (cheat sheet)
| # | Pattern | Why it matters |
|---|---|---|
| 1 | AsyncGenerator-based loops | Natural backpressure, clean cancellation via .return(), typed terminal states |
| 2 | Speculative tool execution | Run safe read-only tools while the model is still streaming β noticeable latency cut |
| 3 | Concurrent-safe batching | Partition by per-invocation safety; serial isolates side effects |
| 4 | Fork agents for cache sharing | Byte-identical prefixes β ~95% input-token savings on children |
| 5 | 4-layer context compression | snip β microcompact β collapse β autocompact, in that order |
| 6 | File-based memory + LLM recall | Beats embeddings for negation and intent-aware retrieval; zero infra |
| 7 | Two-phase skill loading | Frontmatter at startup, body on invocation |
| 8 | Sticky latches | Cache-influencing flags become write-once for the session |
| 9 | Slot reservation | 8K default output, 64K on demand β recovers 12β28% of context |
| 10 | Hook config snapshots | Freeze at boot; defense against mid-session injection from a malicious repo |
17. πΊοΈ Build-your-own: a 14-step roadmap
A pragmatic order to implement these in. Each step compiles and runs on its own.
-
Tool interface + factory. Define
Tool<I, O, P>,buildTool()with safe defaults, and aToolResulttype. Ship one tool:Read. Test the Zod-based JSON Schema generation. -
Query loop v0. Async generator. No tools, no compression, just stream the model and yield messages. Return a
Terminaldiscriminated union. -
Tool execution path. Add the 14-step pipeline as one function. Wire the loop to call it on
tool_useblocks. Always pairtool_usewith atool_result, even on error. -
Permission modes + rules. Implement
default,acceptEdits,plan,bypassPermissions. Add the resolution chain. Skipauto(LLM classifier) for now. -
Concurrency partition + executor.
partitionToolCalls()+ a serial/concurrent executor. AddisConcurrencySafe()to every tool. Yield results in submission order. -
Hook system v0. Two events:
PreToolUse,PostToolUse. Command hooks only (shell process, exit codes). Capture a snapshot at startup. -
State split. Mutable singleton
STATEfor infra (cwd, model, session id). Tiny reactive store for UI (messages, approvals). -
Multi-provider client factory. Direct API first. Stub the others.
buildFetchwrapper for client-request-id header. - Prompt caching architecture. System-prompt boundary marker. Static prefix (cache scope: global if no MCP). Dynamic suffix per-session. Implement one sticky latch as proof.
- Compression v1: snip + microcompact. Skip collapse and autocompact for now. Wire the budget thresholds.
-
Streaming tool executor. Watch the streaming SSE. Start safe tools when their
tool_useis fully parsed. Buffer to preserve submission order. -
AgentTool + sub-agent lifecycle. Re-enter
query()with isolated context. Implement the cleanupfinallyblock. Skip fork agents. - Memory. File layout, frontmatter contract, two-tier retrieval (index + LLM recall side-query). Four types only.
-
Skills (two-phase) + slash commands. Frontmatter at startup; body at invocation;
$ARGUMENTSsubstitution. AddEXTRA_DIRSresolution order.
Save for later (don't build until step 14 lands): fork agents, swarm teams, remote tasks, KAIROS continuous mode, auto-mode permission classifier, MCP transport layer, terminal renderer optimization, bitmap search index.
18. β οΈ Anti-patterns and pitfalls
Loop / control flow
-
β Callbacks or event emitters for the agent loop. You'll re-invent backpressure poorly. Use
async function*. -
β A single
errorterminal state. Lose information. Encode 10+ specific reasons in a discriminated union. -
β Stop hooks on error responses. Creates
error β hook blocks β retry β errorinfinite loops. Skip them. -
β Forgetting to pair
tool_usewithtool_resulton abort. API will reject the next message. Drain queued tools with synthetic results on every cancellation path.
Tools
-
β A constructor literal instead of a factory. Defaults will be unsafe. Always go through
buildTool(). -
β Per-tool-type concurrency safety.
Bashis sometimes safe, sometimes not. Pass parsed input. - β Concatenating built-ins and MCP tools then sorting flat. Cache breakpoint dies. Sort within partition, then concat.
-
β Returning huge raw output. Cap with
maxResultSizeChars. Persist to disk + return preview. -
β Using the SDK's
BetaMessageStream. O(nΒ²) JSON re-parsing. Read raw stream events.
Permissions
-
β Scattering
if mode === ...checks throughout tool code. Centralize in modes + the resolution chain. -
β Trusting a partial bash parse. If
parseForSecurity()fails, treat the command as unsafe. -
β Sub-agent default =
defaultmode. It needs a UI to prompt; bg agents have none. Default tobubble(sync) ordontAsk(async).
Caching / API
- β Runtime conditionals in the static prompt prefix. Each one doubles cache key space. Move below the boundary.
- β Mid-session feature toggles that change request headers. Use sticky latches.
- β Reserving 64K output tokens by default. Over-reserve 8β16Γ. Cap at 8K, escalate on demand.
- β Regenerating the system prompt for fork children. Feature flags or session date may have moved. Pass parent's bytes.
-
β Filtering tools per child agent in fork mode. Different array β different cache key. Use
useExactTools: trueand runtime guards.
Memory
-
β Storing what
git logcan answer. Code patterns, fix recipes, who-changed-what. Useless duplication that goes stale. - β Embedding-only retrieval. Misses negation ("do NOT mock the DB"). Use LLM recall over a manifest.
- β Hard expiration. Annotate with age; let the model decide. Stale memories are still data.
-
β Letting
MEMORY.mdgrow past 200 lines. Truncated silently. Treat the index as a budget.
Multi-agent
-
β Coordinators with the full tool set. They'll do the work themselves. Restrict to
Agent,SendMessage,TaskStop. - β Workers asked to "based on the research, implement X." They re-derive context, miss specifics, hallucinate paths. Synthesis is the coordinator's job.
- β Mid-tool-execution message delivery. Race conditions. Queue at tool-round boundaries.
- β Unbounded teammate state. 36.8 GB / 292 agents was a real production incident. Cap message history.
-
β General-purpose agents that can spawn
Agent. Exponential fan-out. Block recursive spawning at the schema level.
Bootstrap / hooks
-
β Loading the world for
--version. Fast-path dispatch first, full bootstrap second. - β Hook config that updates live mid-session. Lets a malicious repo redefine permissions after trust dialog. Snapshot at startup; update only via explicit user channel.
- β Treating MCP skills like local skills. They are content-only. Never execute their inline shell commands.
π― Closing thought
The deepest principle in the source book is repeated at every layer: push complexity to the boundaries. Permission resolution, protocol translation, state reconciliation, tool I/O β these are the messy edges. Concentrate the mess there. Keep the loop, the tool composition, the memory recall, and the streaming logic clean and exhaustively typed.
If you remember nothing else: most of this system is generators yielding strongly-typed events through a series of small modules, with a few critical caches and a few critical safety doors. Build it in that order.
19. π Glossary
Quick reference for the jargon used throughout this guide.
| Term | Meaning |
|---|---|
| AsyncGenerator | A JS function declared async function*. Yields values lazily, pauses at each yield until consumer calls .next(). Provides backpressure and clean cancellation. |
| Backpressure | The producer pauses when the consumer can't keep up. Generators give it for free; event emitters do not. |
| Cache breakpoint | The byte position in the prompt where the prompt cache stops matching. Move volatile content after the breakpoint to maximize hit rate. |
| Concurrency-safe | A tool invocation that can run in parallel with others without observable side effects. Determined per-input, not per-tool-type. |
| Context window | The token budget for a single API call (prompt + output). When you exceed it the API rejects the request. |
| Discriminated union | A type made of variants tagged by a literal field (`{ kind: 'completed' } \ |
| Fork agent | A sub-agent that inherits the parent's byte-identical prompt prefix to maximize prompt-cache hits (~95% input-token discount on children 2β¦N). |
| Frontmatter | The YAML block at the top of a {% raw %}.md file (between two --- lines). Used for skill/agent/memory metadata. |
| Hook | A user/plugin/policy interceptor at one of 27 lifecycle events. Can block, modify, or inject. |
| MCP | Model Context Protocol β the JSON-RPC standard for connecting external tool servers to an agent. Eight transports. |
| Microcompact | Layer 2 of context compression. Removes tool results by tool_use_id when no longer needed. |
| Prompt cache | Anthropic's server-side cache of prompt prefixes. ~90% discount on cached input tokens. Entire architecture revolves around preserving hits. |
| Reservoir sampling | Algorithm R. Maintain a fixed-size random sample of an unbounded stream. Used here for latency histograms (1,024 entries β accurate p50/p95/p99). |
| Slot reservation | The max_tokens value sent to the API. Default cap 8K, escalate to 64K on truncation (<1% of requests). Reclaims 12β28% of context. |
| Speculative execution | Starting tools while the model is still streaming, before the assistant message completes. Saves hundreds of ms when read-only tools dominate. |
| Sticky latch | A write-once boolean (`null \ |
| Sub-agent | A child agent spawned via {% raw %}AgentTool. New query() generator with isolated message history. Sync (parent waits) or async (background). |
| Synthetic tool result | A fabricated tool_result block emitted on cancellation so the API doesn't see a tool_use without a matching result. |
| Terminal state | The discriminated-union value the agent loop returns (vs. yields). Encodes why execution stopped β 10 distinct reasons. |
tool_use / tool_result |
Anthropic API blocks. Every tool_use in an assistant message must be paired with a tool_result in the next user message. The single most common bug source. |
| Two-phase skill loading | Frontmatter loaded into the system prompt at startup; full body loaded only on invocation. Lets you ship 50+ skills cheaply. |
Sources
- Repo: https://github.com/alejandrobalderas/claude-code-from-source (raw chapter markdown β primary source)
- Companion site: https://claude-code-from-source.com (live, returns HTTP 200; WebFetch was Cloudflare-blocked, content retrieved via direct curl + WebSearch index)
- Chapters analyzed: 1 (Architecture), 2 (Bootstrap), 3 (State), 4 (API Layer), 5 (Agent Loop), 6 (Tools), 7 (Concurrency), 8 (Sub-Agents), 9 (Fork Agents), 10 (Coordination), 11 (Memory), 12 (Extensibility), 13 (Terminal UI), 15 (MCP), 17 (Performance), 18 (Epilogue).
The source repo is purely educational and contains no source code from Claude Code β only original pseudocode derived from npm source maps. This guide follows the same convention.
If you found this helpful, let me know by leaving a π or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! π
Top comments (0)