DEV Community

Cover image for πŸ—οΈ Building Agents Like Claude Code β€” A Source-Derived Blueprint πŸ“˜
Truong Phung
Truong Phung

Posted on

πŸ—οΈ Building Agents Like Claude Code β€” A Source-Derived Blueprint πŸ“˜

A comprehensive synthesis of the claude-code-from-source project (and companion site claude-code-from-source.com) β€” distilled into core principles, techniques, and actionable guidelines for builders who want to ship a coding agent of comparable quality.

The source repo is a 18-chapter educational reverse-engineering of Claude Code derived from npm source maps. No proprietary code is reproduced β€” only architectural pseudocode and design rationale. This guide does the same.


Table of Contents

  1. πŸ’‘ TL;DR β€” the whole agent in one mental picture
  2. 🎯 What you are actually building
  3. 🧱 The six core abstractions
  4. πŸ“¦ State: two tiers, one source of truth
  5. βš™οΈ The agent loop: AsyncGenerator as control plane
  6. πŸ”§ Tools: self-describing, fail-closed, parameterized
  7. ⚑ Concurrency and speculative execution
  8. πŸ”’ Permissions: modes, rules, and bubbling
  9. πŸ—œοΈ Context engineering: the 4-layer compression pipeline
  10. 🌐 The API layer: prompt caching as architecture
  11. πŸ€– Sub-agents and fork agents
  12. πŸ•ΈοΈ Multi-agent coordination patterns
  13. 🧠 Memory: file-based persistence + LLM recall
  14. πŸ”Œ Skills, hooks, plugins β€” extensibility surface
  15. πŸ”— MCP: the universal external-tool protocol
  16. πŸš€ Bootstrap, startup, and rendering performance
  17. πŸ“‹ The 10 foundational patterns (cheat sheet)
  18. πŸ—ΊοΈ Build-your-own: a 14-step roadmap
  19. ⚠️ Anti-patterns and pitfalls
  20. πŸ“– Glossary

0. πŸ’‘ TL;DR β€” the whole agent in one mental picture

Before the details, hold this picture in your head. Everything else is elaboration.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  query() β€” async generator, the only place control flows    β”‚
β”‚                                                             β”‚
β”‚   while not done:                                           β”‚
β”‚     state    = compress(state)              # 4 layers      β”‚
β”‚     response = await stream(model, state)                   β”‚
β”‚     yield response.messages                 # to UI         β”‚
β”‚     if no tool_calls:  return  completed                    β”‚
β”‚     batches  = partition(response.tool_calls)               β”‚
β”‚     for batch in batches:                                   β”‚
β”‚       results = run(batch)                  # parallel-safe β”‚
β”‚       yield results.messages                                β”‚
β”‚       state += results                                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β–²                β–²                β–²             β–²
        β”‚                β”‚                β”‚             β”‚
   Memory (files)    Tools (self-      Hooks (27       Sub-agents
   loaded into       describing,       lifecycle       (recursive
   system prompt     fail-closed,      events)         query() with
   at session        partitioned by                    isolated state)
   start             safety per-call)
Enter fullscreen mode Exit fullscreen mode

Five rules carry 80% of the design:

  1. πŸ”„ The loop is an async generator. Backpressure, cancellation, and typed terminal states fall out for free.
  2. πŸ“ Every tool is self-describing (schema, permissions, concurrency safety). The loop never special-cases tools.
  3. πŸ›‘οΈ Safety is per invocation, not per tool type. Bash("ls") β‰  Bash("rm -rf").
  4. πŸ’Ύ Prompt cache is architecture, not optimization. Static-then-dynamic boundary, sticky flags, byte-identical fork prefixes.
  5. πŸ“ Memory is files. A small LLM picks which to load. No database, no embeddings. Trust through transparency.

If you only build those five things well, you have ~80% of Claude Code. The rest is layering and polish.


1. 🎯 What you are actually building

A production coding agent is not a chat loop with tool calls bolted on. It is a streaming, cancellable, recursive state machine that has to:

  • Survive token-budget exhaustion mid-task without losing the user's work.
  • Run dozens of tools per turn safely, often in parallel, sometimes speculatively.
  • Spawn child agents that cost ~10% of a normal call thanks to prompt cache reuse.
  • Persist semantic knowledge across sessions without a database.
  • Allow third parties to extend it (skills, hooks, MCP) without crashing the host.
  • Boot in under 300 ms and stream the first token in well under a second.

If your design omits any of these, you will hit a wall later. Build for them on day one β€” most of them are cheap when planned, expensive when retrofitted.

The closing principle of the source book: push complexity to the boundaries. Protocol translation, state reconciliation, external tool invocation, permission checking β€” these belong at the edges. The interior (loop, memory, tool composition) stays clean and exhaustively typed.


2. 🧱 The six core abstractions

Every part of Claude Code reduces to one of these. Implement them as first-class modules, not as helpers attached to a god object.

# Abstraction Responsibility Approx LoC in CC
1 Query Loop Async generator that streams model output, runs tools, appends results, decides when to stop. Returns a typed Terminal discriminated union (10 reasons). ~1,700
2 Tool System Self-describing tools with schema, permissions, concurrency, rendering. Batched into concurrent/serial groups. Speculative execution during streaming. β€”
3 Tasks Background units following `pending β†’ running β†’ completed \ failed \
4 State Two layers: a mutable singleton {% raw %}STATE (~80 fields, infrastructure) + a 34-line reactive store (UI: messages, approvals, progress). β€”
5 Memory File-tier persistence (CLAUDE.md, ~/.claude/MEMORY.md, team symlinks). LLM picks relevant memories at session start. β€”
6 Hooks Lifecycle interceptors at 27 events, in 4 forms: shell command, single-shot prompt, agent loop, HTTP webhook. β€”

Why this carving

  • The Query Loop is the only place control flow lives. Tools, hooks, sub-agents β€” they all yield through it.
  • State is split because infrastructure mutates rarely but reads constantly; UI is the opposite. One subscription model can't serve both.
  • Memory is its own primitive (not a tool) because it is read on every system-prompt build, before any tool can run.
  • Hooks are first-class because the permission system itself runs partially as PreToolUse hooks. They are not an afterthought.

3. πŸ“¦ State: two tiers, one source of truth

State design is where most agent codebases collapse. Claude Code splits it into two tiers with strict layering:

Tier What it holds Mutability Reachable from
Bootstrap state (STATE) ~80 fields: originalCwd, sessionId, model overrides, cost accumulators, telemetry handles, prompt-cache allowlists Mutable through ~100 typed setters Everywhere β€” DAG leaf, depends on nothing but Node.js stdlib
AppState (reactive store) Messages, input mode, tool approvals, progress indicators, todos Immutable snapshots; updater functions only Inside React components

Why split them

  • Availability: session ID, telemetry, and cost trackers must exist before React mounts. A reactive store cannot serve them.
  • Access pattern: bootstrap state is read constantly, mutated rarely, with no subscribers. AppState is read by render subscribers on every change. One subscription model can't serve both.
  • Dependency direction: bootstrap depends on nothing β†’ AppState imports bootstrap β†’ React imports AppState. Enforce this with a lint rule. Cycles will sneak in otherwise.

The reactive store in 34 lines

function makeStore(initial, onTransition) {
  let current = initial
  const subs = new Set()
  return {
    read:      () => current,
    update:    (fn) => {
      const next = fn(current)
      if (Object.is(next, current)) return        // skip noop
      const prev = current; current = next
      onTransition?.(prev, next)                  // side effects FIRST
      subs.forEach(cb => cb())                    // then UI
    },
    subscribe: (cb) => { subs.add(cb); return () => subs.delete(cb) },
  }
}
Enter fullscreen mode Exit fullscreen mode

Three deliberate choices:

  • Updater-only mutations. No set(value) API. Stale-closure bugs vanish.
  • Object.is guard. Identical references skip re-renders and side effects.
  • onChange fires before listeners. Side effects (e.g. persist to disk, notify remote session) complete before the UI flips.

The sticky latch pattern (write-once flags)

A pattern worth memorizing β€” applies any time a value influences a server-side cache key:

type Latch = boolean | null   // null = "not yet evaluated"

function shouldSendBetaHeader(featureCurrentlyActive: boolean): boolean {
  const latched = getAfkLatch()
  if (latched === true) return true            // already on β€” keep sending
  if (featureCurrentlyActive) {
    setAfkLatch(true)                           // first activation β€” latch
    return true
  }
  return false                                  // never activated
}
Enter fullscreen mode Exit fullscreen mode

The three-state type self-documents intent: null says "we haven't decided yet." Once true, never returns to false. Five such latches in Claude Code prevent mid-session feature toggles from busting 50–70K tokens of cached prompt.

Centralizing side effects on diffs

A real production bug: permission mode was synced to the remote session by 2 of 8+ mutation paths. Eventually one drifted. The fix was a single onChangeAppState(prev, next) callback that detects field changes structurally β€” every mutation path is automatically covered. Side effects scale much more slowly than mutation sites; centralize on diffs, not events.

Cost tracking (a concrete example)

Every API response runs through addToTotalSessionCost:

  • Accumulates per-model usage in bootstrap state.
  • Reports to OpenTelemetry.
  • Recursively processes nested model calls (sub-agents, recall queries).
  • Persists to project config on process exit.
  • Restores on next session only if the persisted session ID matches.

Histograms use reservoir sampling (Algorithm R) with 1,024 entries to compute p50/p95/p99. Averages hide tail latency, and tail latency is what users feel.

Actionable: even in v0, instrument cost and latency. You cannot decide what to optimize from feel.


4. βš™οΈ The agent loop: AsyncGenerator as control plane

The loop is an async function* β€” not a while with callbacks, not an event emitter, not an RxJS pipeline. There are three concrete reasons to choose generators:

  1. Backpressure for free. A generator yields only when the consumer calls .next(). The REPL pulls via for await, naturally pausing if the UI can't render fast enough.
  2. Typed terminal states. The generator's return is a discriminated union of why execution stopped: completed, max_turns, error, aborted_streaming, aborted_tools, prompt_too_long, image_error, model_error, stop_hook_prevented, hook_stopped, blocking_limit. The compiler enforces exhaustive handling.
  3. Composability. Inner generators delegate via yield*. No callback nesting, no promise plumbing.

Loop skeleton

async function* query(initialState):
  state = initialState
  while true:
    state = compress(state)              // 4-layer pipeline (Β§8)
    response = await callModel(state)    // streaming
    yield* response.messages             // surface to UI

    if response.error and recoverable:
      state = recover(state, error)
      continue
    if response.error and not recoverable:
      return { kind: 'model_error', error }
    if not response.toolCalls:
      if stopHookBlocks(state):
        state = applyHookFeedback(state)
        continue
      return { kind: 'completed' }

    batches = partitionToolCalls(response.toolCalls)
    for batch in batches:
      results = await executeBatch(batch, state)
      yield* results.messages
      state = appendToolResults(state, results)

    // re-enter with new state
Enter fullscreen mode Exit fullscreen mode

Continue states (don't return, just continue)

collapse_drain_retry, reactive_compact_retry, max_output_tokens_escalate, max_output_tokens_recovery, stop_hook_blocking, token_budget_continuation, next_turn. Naming each one is what makes the loop testable β€” every test asserts which transition fired.

Error recovery is a ladder, not a fallback

Order matters. From least to most aggressive:

Trigger Step 1 Step 2 Step 3
prompt_too_long (413) drain staged collapse summaries reactive compact surface to user
max_output_tokens escalate cap 8K β†’ 64K multi-turn recovery (≀3 attempts) surface
media_size_error reactive compact β€” surface

Guards prevent infinite loops: hasAttemptedReactiveCompact one-shot flags, hard caps on recovery attempts, circuit breakers. Never run stop hooks on an error response β€” that creates "error β†’ hook blocks β†’ retry β†’ error" spirals.

Cancellation

Aborts can hit during streaming or during tool execution. In both cases, the executor must drain remaining requests by emitting synthetic tool_result blocks for queued/running tools. The Anthropic API rejects an assistant message containing a tool_use block without a matching tool_result. signal.reason distinguishes hard aborts from "submit interrupts" (a new user message), so you skip redundant interruption stubs in the latter case.

Actionable: every tool_use your agent emits must have a paired tool_result in message history before the next API call. Make this an invariant your loop enforces, not a hope.


5. πŸ”§ Tools: self-describing, fail-closed, parameterized

Interface

A tool is parameterized by three types: Input, Output, and Progress. The Input doubles as a Zod schema and the JSON Schema given to the model.

The full Tool interface in Claude Code has ~45 members. Five are critical:

  1. call(input, ctx) β€” runs the work.
  2. inputSchema β€” Zod schema (validated, plus auto-generated JSON Schema).
  3. isConcurrencySafe(parsedInput) β€” per invocation, not per type.
  4. checkPermissions(parsedInput, ctx) β€” returns allow | deny | ask | passthrough with optional updatedInput.
  5. validateInput(parsedInput, ctx) β€” semantic checks beyond schema (e.g. reject no-op edits).

The buildTool() factory pattern (fail-closed)

Never construct a tool literal directly. Wrap it in a factory that fills in dangerous defaults conservatively:

const SAFE_DEFAULTS = {
  isEnabled:        () => true,
  isParallelSafe:   () => false,   // serial unless proven otherwise
  isReadOnly:       () => false,   // assume writes
  isDestructive:    () => false,
  checkPermissions: (input) => ({ behavior: 'allow', updatedInput: input }),
}
Enter fullscreen mode Exit fullscreen mode

If a tool author forgets isConcurrencySafe, they get serial execution β€” slow, but never corrupting. The opposite default would silently produce race conditions.

Tool result shape

type ToolResult<T> = {
  data: T
  newMessages?: Message[]                              // e.g. AgentTool injects sub-agent transcript
  contextModifier?: (ctx: ToolUseContext) => ToolUseContext  // e.g. EnterPlanMode
}
Enter fullscreen mode Exit fullscreen mode

Context modifiers only apply to serial tools. Concurrent tools queue modifiers until the batch completes β€” otherwise data dependencies and shared state become race-condition territory.

The 14-step execution pipeline (checkPermissionsAndCallTool())

This is the choreography every tool call goes through. Implement it as a single function that returns a ToolResult or ToolError. Skipping any of these steps will hurt later.

# Step Why it matters
1 Tool lookup (with alias map) Old transcripts may reference renamed tools
2 Abort check Don't waste compute on cancelled queued calls
3 Zod validation Catch type errors; hint to call ToolSearch for deferred tools
4 Semantic validation E.g. reject no-op edits, block sleep if a Monitor tool exists
5 Speculative classifier start Fire auto-mode permission classifier in parallel for Bash
6 Input backfill Expand ~/foo β†’ absolute paths for hooks/permissions but keep originals for transcript stability
7 PreToolUse hooks Hooks decide / modify / block
8 Permission resolution Rule match β†’ tool method β†’ mode default β†’ prompt β†’ classifier
9 Permission denied path Build error, fire PermissionDenied hook
10 Execute call() The actual work
11 Result budgeting Persist oversized output to disk; replace with preview
12 PostToolUse hooks Modify MCP output, possibly block continuation
13 Append newMessages Sub-agent transcripts, system reminders
14 Error classification Telemetry, OTel events

Result budgeting

Per-tool size caps prevent runaway output:

Tool maxResultSizeChars Rationale
Bash 30,000 Most useful output fits
Edit 100,000 Diffs need room
Grep 100,000 Search results accumulate
Read ∞ Self-bounded by token limit; persisting would create circular Read loops

Above the cap, the system writes the full content to a <persisted-output> file and returns a preview pointing to it. An aggregate ContentReplacementState tracks per-conversation budgets so multiple near-cap results cannot blow context together.

Deferred loading

Tools marked shouldDefer: true send only { name, description, defer_loading: true } to the API. The model has to call ToolSearch to load full schemas. Three benefits:

  • Smaller initial prompt.
  • Adding/removing a deferred tool changes the prompt by a few tokens, not hundreds β€” prompt cache stays warm.
  • Less tool-soup confusion for the model.

Tool registry assembly order matters

final = sort(builtins, alpha) ++ sort(mcpTools, alpha)
Enter fullscreen mode Exit fullscreen mode

Sort within each partition, then concatenate. A flat sort across all tools would interleave MCP tools into built-in positions, busting cache breakpoints whenever MCP servers are added/removed.


6. ⚑ Concurrency and speculative execution

The core insight

Safety is determined per-invocation, not per-tool-type. Bash("ls -la") is concurrency-safe. Bash("rm -rf build/") is not. Same tool. Different inputs. Different verdict.

The partition algorithm

partitionToolCalls(calls):
  batches = []
  current = { kind: 'concurrent', tools: [] }
  for call in calls:
    tool = lookup(call.name)
    parsed = tool.inputSchema.safeParse(call.input)
    safe = parsed.success and tool.isConcurrencySafe(parsed.data)
    if safe and current.kind == 'concurrent':
      current.tools.push(call)
    else if safe:
      batches.push(current); current = { kind: 'concurrent', tools: [call] }
    else:
      if current.tools: batches.push(current)
      batches.push({ kind: 'serial', tools: [call] })
      current = { kind: 'concurrent', tools: [] }
  if current.tools: batches.push(current)
  return batches
Enter fullscreen mode Exit fullscreen mode

Example: [Read, Read, Grep, Edit, Read] β†’ [concurrent[Read, Read, Grep], serial[Edit], concurrent[Read]].

Parsing failure β†’ serial. Safety-check exception β†’ serial. Always fail closed.

Speculative streaming execution

The StreamingToolExecutor watches the model stream. The moment a tool_use block is fully parsed (often seconds before the response finishes), it starts that tool β€” provided admission rules allow.

Admission rule: a tool can start executing iff no tool is currently running, or both the new tool and all currently-running tools are concurrency-safe.

Sequential timeline: stream 2.5s + 3 serial tools = 3.1s
Speculative: stream 2.5s overlapped with tools 1–2; total 2.6s

Tool states: Queued β†’ Executing β†’ Completed β†’ Yielded. Yield in submission order, not completion order β€” even if c.ts finishes before a.ts, the conversation history must remain a, b, c.

Error cascade policy

  • Bash errors cascade within a batch. Shell commands form implicit pipelines; running cp after a failing mkdir is pointless.
  • Read/Grep errors isolate. One file read failure has no bearing on a sibling grep.

Cancelled siblings get synthetic results: "Cancelled: parallel tool call Bash(mkdir build) errored".

Interrupt behavior

Each tool declares interruptBehavior(): 'cancel' | 'block'. The executor treats an executing batch as interruptible only when all tools in it support cancel. A single block tool blocks user Esc for the whole batch.


7. πŸ”’ Permissions: modes, rules, and bubbling

Seven modes (most β†’ least permissive)

Mode Behavior
bypassPermissions No checks (testing only)
dontAsk Auto-deny prompts (background agents β€” never block on user input)
auto Lightweight LLM classifier evaluates each call against transcript
acceptEdits File edits auto-allowed; other mutations prompt
default Standard interactive β€” user approves each action
plan Read-only; all writes denied
bubble Sub-agent escalates the decision to its parent

Sub-agents default to bubble. Background agents default to dontAsk (they can't block on a prompt that has no UI).

Resolution chain

1. Hook decision?         β†’ final
2. allowedRules / deniedRules / askRules match?  β†’ final
3. tool.checkPermissions()  β†’ allow | deny | ask | passthrough
4. Mode default
5. (interactive only) prompt user
6. (auto only) classifier
Enter fullscreen mode Exit fullscreen mode

Rules

Three pieces: source (tracks provenance), ruleBehavior (allow/deny/ask), ruleValue (with optional content patterns).

  • Bash(git *) β€” Bash commands starting with git
  • Edit(/src/**) β€” file edits restricted to /src
  • Fetch(domain:example.com) β€” HTTP fetches limited to that domain

For Bash, parse the command via a real bash AST parser (parseForSecurity()), split on && || ; |, and classify each subcommand. If the parser fails, return fail-safe behavior β€” assume any command it can't parse is unsafe.


8. πŸ—œοΈ Context engineering: the 4-layer compression pipeline

Run before every API call, in this strict order:

Layer What it does Cost
0. Tool result budget Enforce per-message size caps; exempt tools without finite maxResultSizeChars Trivial
1. Snip compact Physically remove old messages; emit UI boundary marker; report tokens freed Cheap
2. Microcompact Drop tool results by tool_use_id once unneeded; cache edits via deferred boundary messages Cheap
3. Context collapse Replace conversation spans with summaries (granular) Medium
4. Auto-compact Fork an entire Claude conversation to summarize history; circuit-break after 3 consecutive failures Heavy

Why ordering matters: if collapse alone gets tokens below the auto-compact threshold, auto-compact never runs β€” so you keep fine-grained recent history.

Budget thresholds

  • Auto-compact triggers at effectiveContextWindow βˆ’ 13,000 tokens.
  • Hard blocking limit at effectiveContextWindow βˆ’ 3,000.
  • 10K-token gap between them is where reactive compact runs if proactive failed.

Token counting blends authoritative API usage numbers with rough estimates for messages added since the last response β€” biased conservative so compaction fires slightly early.

Actionable: instrument both estimated and authoritative token counts, log the delta. When the delta drifts, your estimator is broken and your safety margins are wrong.


9. 🌐 The API layer: prompt caching as architecture

Prompt caching is not an optimization. It is an architectural constraint. Every design decision either preserves cache hits or busts them.

Multi-provider abstraction

A single getAnthropicClient() factory dispatches to one of:

  • Direct API (key or OAuth)
  • AWS Bedrock
  • Google Vertex AI
  • Azure Foundry

Provider chosen at boot from env vars + config. Stored in bootstrap state; never re-checked. SDKs dynamically imported (don't load Bedrock if you're on direct API).

A buildFetch wrapper injects an x-client-request-id UUID header on every request, so you can correlate client-side timeouts with server-side logs.

Cache scopes

Scope Where TTL
Global Static prompt prefix shared across all users Long
1-hour Eligible users' extended cache 60 min
Ephemeral (default) Per-session ~5 min

The system prompt has a literal === DYNAMIC BOUNDARY === marker:

  • Above (cacheScope: global): identity, system rules, task guidance, tool usage instructions, tone/style.
  • Below (per-session): session guidance, CLAUDE.md, env info, language, MCP instructions (uncached, marked dangerous), output style.

Rule: every runtime if above the boundary doubles the cache key space. 3 conditionals = 8 prefixes. 5 = 32. Compile-time feature flags are fine; runtime checks must live below the boundary.

Global scope is disabled when MCP tools are present β€” user-specific tool definitions would fragment the global cache into millions of unique prefixes.

Sticky latches

Five session-scoped boolean flags that, once set, cannot be unset for the rest of the session. They control beta/feature headers. Reason: "mid-session toggles don't change the server-side cache key" β€” flipping a flag would bust 50–70K tokens of cached context.

Pattern: Once(value) β€” a setter that throws or no-ops on second call. Use this for any cache-influencing config.

Output token slot reservation

Production p99 output = 4,911 tokens. Default SDK reservation = 32K–64K. Over-reservation = 8–16Γ—.

Strategy: cap default max_tokens at 8K. On the rare truncation (<1% of requests), retry with 64K. Recovers 12–28% of the context window for free.

Streaming: skip the SDK helper

The SDK's BetaMessageStream calls partialParse() on every input_json_delta β€” repeatedly re-parsing growing JSON from scratch (O(nΒ²)). Use raw Stream<BetaRawMessageStreamEvent> and accumulate tool-input strings yourself.

Watchdog and fallback

  • Idle watchdog: setTimeout(90s) reset on every chunk. At 45s, warn. At 90s, abort and retry non-streaming.
  • Non-streaming fallback activates when streaming dies mid-response (network, stall, truncation, proxies returning 200 with non-SSE bodies).
  • Disable fallback when streaming tool execution is active β€” duplicate tool runs would corrupt state.

10. πŸ€– Sub-agents and fork agents

Single-agent capability has a hard ceiling. The fix is recursive: spawn child agents that are the same loop with isolated state.

AgentTool input schema (dynamic)

Field Purpose
description 3–5 word task summary
prompt Full instructions
subagent_type Specialization key (optional)
model Override (haiku/sonnet/opus)
run_in_background Async execution
name For team addressability
isolation worktree (filesystem clone) or remote

Critical pattern: feature-gate the schema itself. "The model never sees fields it cannot use." Don't tell the model "don't use name here" β€” remove name from the schema in this context. The model cannot misuse what it cannot see.

Output (discriminated union)

  • Sync: { status: 'completed', prompt, ...result }
  • Async: { status: 'async_launched', agentId, outputFile } β€” outputFile is a filesystem path that fills in when the bg agent completes; parents poll independently of process state.

The 15-step lifecycle (runAgent())

  1. Model resolution β€” caller override > agent definition > parent model > default. Read-only agents default to Haiku.
  2. Agent ID β€” agent-<hex>. Override path supports resuming a backgrounded agent.
  3. Context preparation β€” fork agents clone parent history (after filterIncompleteToolCalls()); fresh agents start empty.
  4. CLAUDE.md stripping β€” read-only agents (Explore, Plan) omit project instructions. Saves ~10.2% of fleet cache_creation tokens.
  5. Permission isolation β€” per-agent getAppState() overlay. Permissive parent modes (bypass, acceptEdits) always win.
  6. Tool resolution β€” fork agents reuse parent's exact array byte-for-byte; normal agents apply allow/deny lists. General-purpose agents cannot spawn sub-agents (prevents exponential fan-out).
  7. System prompt β€” fork agents inherit pre-rendered bytes; normal agents call agentDef.getSystemPrompt(ctx).
  8. Abort controller β€” sync agents share parent's controller (Esc kills both). Async agents get an independent one (survive parent abort).
  9. Hook registration β€” agent-id-scoped, auto-cleanup on termination.
  10. Skill preloading β€” declared in frontmatter, loaded concurrently to mask latency, prepended as a user message.
  11. MCP initialization β€” inline servers (cleaned on termination) or shared configs (memoized, persistent). Must complete before context creation so tools are in the pool when snapshotted.
  12. Context creation β€” createSubagentContext() makes isolation decisions:

    Aspect Sync Async
    setAppState shared isolated
    setAppStateForTasks shared shared
    readFileState own cache own cache
    abortController parent's independent
  13. Cache-safe params callback β€” for bg agents; lets the summarization service fork the conversation with cache-identical prefix.

  14. Query loop β€” same query() function. Yields back to caller, records to sidechain JSONL transcript, forwards metrics.

  15. Cleanup (finally) β€” MCP cleanup, hook clear, agent tracking, file cache, message GC, kill orphan shell tasks, remove agent's todos.

Fork agents: cache-driven subprocess design

The point of a fork is byte-identical request prefix to the parent, so children pay 10% input-token cost.

Three mechanisms make this work:

  1. System prompt threading β€” pass parent's already-rendered bytes via override.systemPrompt. Don't regenerate; feature flags or session date may have changed.
  2. Exact tool passthrough β€” useExactTools: true. No filtering, no reordering, no re-serialization. Even forbidden tools (like AgentTool itself) stay in the array β€” runtime guards prevent misuse.
  3. Placeholder tool results β€” buildForkedMessages() clones the parent's last assistant message. For each tool_use, it inserts a constant placeholder string "Fork started -- processing in background". Same string for every child β†’ same bytes.

Resulting structure: [...shared_history, assistant(all_tool_uses), user(placeholders..., directive)].

Only the final directive differs across children. With a 48,500-token shared prefix and 5 children, savings exceed 90% on input tokens for children 2–5.

When fork is disabled

  • Coordinator mode β€” coordinators have a structured-delegation prompt children would inappropriately inherit.
  • Non-interactive β€” fork uses permissionMode: 'bubble', which needs a user-facing prompt.
  • Explicit subagent_type β€” the user picked Explore/Plan/etc, so fork yields.

Recursive fork prevention (defense in depth)

  1. Primary: child's context.options.querySource = 'agent:builtin:fork'. AgentTool checks this before allowing fork.
  2. Fallback: scan message history for the boilerplate XML tag if querySource was lost in transit.

Six built-in agent archetypes

Archetype Model Tools Notable
General-purpose Default All except Agent Workhorse
Explore Haiku Read-only Omits CLAUDE.md, one-shot prompt (saves 135 chars/invocation)
Plan Inherit Read-only 4-step process, must end with "Critical Files" list
Verification Inherit Read-only, async System prompt explicitly anti-rationalization; requires adversarial probe
Claude Code Guide Haiku dontAsk mode Doc fetcher; system prompt injects user's configured skills/agents/MCP
Statusline Setup Sonnet Read + Edit only Narrowly-scoped specialist

Frontmatter format for user-defined agents

---
description: "When to use this"
tools: [Read, Bash]
disallowedTools: [FileWrite]
model: haiku
permissionMode: dontAsk
maxTurns: 50
skills: [my-skill]
mcpServers: [slack, {my-server: {command: node, args: [./server.js]}}]
hooks:
  PreToolUse:
    - command: "echo validating"
---

# System prompt body in markdown...
Enter fullscreen mode Exit fullscreen mode

Trust hierarchy (least to most trusted): user agents < plugin agents < policy agents < built-in. User-agent hooks/MCP are silently skipped under strictPluginOnlyCustomization β€” graceful degradation, not error.


11. πŸ•ΈοΈ Multi-agent coordination patterns

Three distinct shapes:

A. Simple background delegation

Fire-and-forget. Tests, searches, lints. No coordination protocol.

B. Coordinator mode

Hierarchical manager-worker. The coordinator gets only three tools: Agent (spawn), SendMessage (talk), TaskStop (kill). That's it. By design.

"The coordinator's job is to think, plan, decompose, and synthesize. Workers do the work."

Critical principle: never delegate understanding. Coordinators must give workers exact file paths, exact line numbers, exact change descriptions β€” not "based on the research, fix the bug."

Workflow phases:

  1. Research β€” multiple workers explore in parallel
  2. Synthesis β€” coordinator (not workers) integrates findings
  3. Implementation β€” workers receive precise instructions
  4. Verification β€” workers validate

C. Swarm teams

Peer-to-peer. Same process, isolated via AsyncLocalStorage, file-based mailboxes. Each message has metadata (sender, timestamp, color for UI).

Three interruption levels:

  • Abort current work β€” cancel turn, keep operating
  • Shutdown request β€” cooperative graceful wind-down
  • Kill β€” hard abort via controller

Task state machine (universal)

All background work β€” bash, sub-agents, remote sessions, teammates, dreams β€” flows through one state model:

pending β†’ running β†’ { completed | failed | killed }
Enter fullscreen mode Exit fullscreen mode

Seven task types with single-char visual prefixes: local_bash (b), local_agent (a), remote_agent (r), in_process_teammate (t), local_workflow (w), monitor_mcp (m), dream (d).

SendMessage dispatch order

  1. Bridge (bridge:<session-id>) β€” cross-machine via Remote Control relays
  2. UDS (uds:<socket-path>) β€” local IPC via Unix Domain Sockets
  3. In-process β€” agent IDs / names of running agents
  4. Team mailbox β€” file-based queue

Killer feature: transparent agent resumption. Sending a message to a "dead" agent automatically resurrects it from its disk transcript. The conversation simply continues.

Command queue invariant

Messages are delivered between tool rounds, never mid-execution. The agent finishes the current turn, then receives new info. No race conditions, no corrupted state. Make this a hard rule β€” it's the cheapest way to get correctness in multi-agent comms.

Pattern selection

Scenario Pattern
Single bg task Delegation
Multi-file refactor with research phase Coordinator
Long-running collaborative dev Swarm

Operational guardrail

A 50-message memory cap on in-process teammates exists because a real production incident reached 36.8 GB across 292 agents. Plan for unbounded fan-out from day one or it will hurt you.


12. 🧠 Memory: file-based persistence + LLM recall

Why files, not a database

  • Transparency β€” users open .md files and see exactly what the agent remembers. Trust through observability, not capability.
  • Modification time is a built-in epistemological signal: "when was this observation recorded?"
  • Zero infrastructure β€” no schema migrations, no indexes, no backups.

Layout

~/.claude/projects/<sanitized-git-root>/memory/
  MEMORY.md                        # always loaded; index only; ≀200 lines, ≀25 KB
  user_role.md                     # one memory per file
  feedback_testing.md
  project_migration_q2.md
  team/                            # shared via symlink
  logs/YYYY/MM/YYYY-MM-DD.md       # KAIROS append-only mode
Enter fullscreen mode Exit fullscreen mode

Four-type taxonomy

Type Purpose
user Role, expertise, preferences
feedback Corrections + validated approaches (lead with rule, then Why: and How to apply: lines)
project Active work context with absolute dates (always convert "Thursday" β†’ 2026-03-05)
reference Pointers to external systems (Linear, Slack channels)

Derivability test: if git log / git blame / the code itself can answer it, don't memorize it. No code patterns, no architecture, no debug fix recipes.

Frontmatter contract

---
name: <title>
description: <one-line summary used by recall LLM>
type: user | feedback | project | reference
---

<body β€” for feedback/project, structure as: rule β†’ **Why:** β†’ **How to apply:**>
Enter fullscreen mode Exit fullscreen mode

The description field carries the most weight β€” it's the LLM-recall index.

Two-tier retrieval

  • Tier 1 (always loaded): MEMORY.md index (~3,000 tokens for ~150 entries). Lines after 200 are truncated.
  • Tier 2 (on-demand): an async Sonnet side-query gets the manifest (type, name, date, description), the user's current query, and recent tool history. Returns up to 5 filenames as structured JSON. Validated against the file list to catch hallucination.

This trades a few hundred ms of latency for semantic precision keyword-matching cannot achieve β€” especially for negation (do NOT use mocks).

Staleness policy

Don't expire. Annotate. Today/yesterday β†’ no caveat. Older β†’ human-readable warning ("This memory is 47 days old β€” code claims may be outdated"). Models reason better about "47 days ago" than ISO timestamps.

Write path (two-step)

  1. Write <type>_<topic>.md with frontmatter + body.
  2. Add a one-line pointer to MEMORY.md: - [Title](file.md) β€” one-line hook.

A background extraction agent runs at loop completion to catch memories the main agent missed.

KAIROS continuous mode

For long-lived sessions, replace two-step writes with append-only daily logs in logs/YYYY/MM/. A separate consolidation pass (after 24h or 5+ modified sessions) merges logs into structured memories.

Security (team paths)

Three-layer validation, all fail-closed:

  1. Input sanitization (null bytes, traversal sequences, Unicode attacks)
  2. String-level path validation with trailing-separator checks
  3. Symlink resolution against the deepest existing ancestor

No partial-success fallbacks. Reject early, reject completely.


13. πŸ”Œ Skills, hooks, plugins β€” extensibility surface

Skills: two-phase loading

The killer pattern. 50 skills shouldn't cost 50 docs of system-prompt tokens at startup.

  • Phase 1 (startup): parse YAML frontmatter only β€” name, description, when_to_use. Inject into system prompt as a directory.
  • Phase 2 (invocation): load full markdown body, substitute $ARGUMENTS and ${CLAUDE_SESSION_ID}, execute inline shell commands, prepend as a user message.

You pay the token cost only when the skill actually runs.

Skill source priority (highest β†’ lowest)

  1. Managed (policy / enterprise)
  2. User (~/.claude/skills/)
  3. Project (.claude/skills/)
  4. --add-dir flag
  5. Legacy commands
  6. Bundled
  7. MCP (remote, untrusted)

Hard security boundary: MCP skills never execute inline shell commands. External MCP servers are content-only. No exceptions.

Frontmatter controls

name: my-skill
description: ...
when_to_use: ...
disable-model-invocation: false   # block autonomous use
context: fork                     # run as sub-agent with own token budget
paths: ["src/**/*.ts"]            # conditional activation
hooks:
  PreToolUse: [...]
Enter fullscreen mode Exit fullscreen mode

Hooks: 27 events, 6 types

User-configurable:

  • Command β€” spawn shell process, read stdout/exit code
  • Prompt β€” lightweight LLM call
  • Agent β€” multi-turn loop (max 50 turns)
  • HTTP β€” POST to remote policy server

Internal:

  • Callback β€” programmatically registered
  • Function β€” session-scoped TypeScript

Top 5 lifecycle points to know:

Hook Fires Can do
PreToolUse Before tool execution Block / modify / approve / inject context
PostToolUse After successful execution Inject feedback, replace MCP output
Stop Before Claude concludes Force continuation (verification loops)
SessionStart Session begin Cannot block
UserPromptSubmit User submits Block (input validation)

Other events span tool lifecycle (PostToolUseFailure, PermissionDenied, PermissionRequest), session (SessionEnd, Setup), subagents (SubagentStart, SubagentStop), compaction (PreCompact, PostCompact), notifications, configuration, file watching, task tracking β€” 27 in total.

Snapshot security model

captureHooksConfigSnapshot() freezes hook config at startup. If malicious code modifies .claude/settings.json mid-session, the snapshot prevents the change from taking effect. Only the /hooks command or the file watcher can update the live config.

Policy cascade: enterprise hooks cannot be disabled by users; allowManagedHooksOnly restricts to policy-approved hooks.

Exit code semantics (command hooks)

Code Meaning
0 success
2 blocking error (deliberately uncommon to prevent accidental enforcement)
other non-blocking warning

Skill ↔ hook integration

When a skill is invoked, its frontmatter hooks register as session-scoped. The skill directory becomes CLAUDE_PLUGIN_ROOT for those hook commands. once: true removes the hook after first execution. For sub-agents, Stop hooks auto-convert to SubagentStop to fire at the correct lifecycle point.


14. πŸ”— MCP: the universal external-tool protocol

Skills and hooks extend the agent in-process. MCP (Model Context Protocol) is the standard way third parties extend it out-of-process β€” across servers, vendors, and trust boundaries. If you want a tool ecosystem you don't control, this is the layer that makes it possible.

Eight transports, three deployment shapes

Shape Transport Use
Local process stdio (default) Subprocess; JSON-RPC over stdin/stdout; no auth
Remote server http Streamable HTTP; POST + optional SSE
sse Legacy (pre-2025)
ws WebSocket bidirectional
claudeai-proxy Routed via Claude.ai infrastructure
In-process sdk Control messages over stdin/stdout
InProcessTransport Direct function calls via queueMicrotask() (63 lines)
IDE sse-ide, ws-ide Runtime-specific

Recommendation: start with stdio for local tools. Move to http only when you need remote. Use InProcessTransport for tools you control end-to-end β€” eliminates subprocess overhead.

Tool wrapping (4 stages)

External MCP tools must merge into the same Tool interface as built-ins. Four transformations:

  1. Name normalization β†’ mcp__{server}__{tool}. Invalid characters become underscores. Match ^[a-zA-Z0-9_-]{1,64}$.
  2. Description truncation at 2,048 chars. (Real-world: OpenAPI servers were dumping 15–60 KB descriptions.)
  3. Schema passthrough. Pass MCP input schemas straight through; do not transform.
  4. Annotation mapping. readOnlyHint: true β†’ enables concurrent execution. destructiveHint: true β†’ triggers stricter permission checks.

After wrapping, MCP tools are indistinguishable from built-ins at the loop level. The same 14-step execution pipeline runs.

Configuration scopes (7 sources, content-deduplicated)

Scope Source Trust
local .mcp.json in project User approval required
user ~/.claude.json User-managed
project Project-level Shared
enterprise Org-managed Pre-approved
managed Plugin-provided Auto-discovered
claudeai Web interface Pre-authorized
dynamic SDK injection Programmatic

Servers with matching command/args (or URLs) are deduplicated by content, not by name. Two configs naming the same binary differently still merge.

OAuth (RFC 9728 + RFC 8414)

Discovery chain when a server returns 401:

  1. Probe /.well-known/oauth-protected-resource for authorization-server metadata.
  2. Fall back to RFC 8414 discovery against the MCP server itself.
  3. Use configured authServerMetadataUrl as escape hatch.

Cross-App Access (XAA) enables federated token exchange via identity providers. Real-world spec violations are common β€” normalizeOAuthErrorBody() rewrites Slack's "200 with error body" responses to a proper HTTP 400. Plan for spec drift on day one.

Server lifecycle

  • States: connected, failed, needs-auth (15-min TTL cache), pending, disabled.
  • Spawn batching: local in batches of 3, remote in batches of 20 β€” protects against file-descriptor exhaustion.
  • Session-expiry detection: Streamable HTTP returns 404 + JSON-RPC code -32001 β†’ reconnect + single retry.

Timeout layers

Layer Duration Why
Connection 30 s Unreachable / slow servers
Per-request 60 s Fresh AbortSignal per request
Tool call ~27.8 h Legitimate long-running operations
Auth 30 s Unreachable OAuth servers

Trap: if you reuse a single AbortSignal across requests it expires during idle periods. wrapFetchWithTimeout() creates a fresh signal per request. Memorize this.

Critical security rule

MCP skills never execute inline shell commands. External servers are content-only. Every other extension surface (user skills, project skills) can run shell; MCP cannot. This is the single most important MCP rule and the one you will be tempted to break.

InProcessTransport in 63 lines

Two key mechanics:

  • send() delivers via queueMicrotask() β€” prevents stack-depth blow-ups on synchronous request/response cycles.
  • close() cascades to peer transport β€” no half-open connection states.

If you are wrapping an internal service as an MCP server, this is your reference. Don't subprocess what you can call directly.


15. πŸš€ Bootstrap, startup, and rendering performance

The 5-phase pipeline (target: < 300 ms)

Phase File What happens
0. Fast-path dispatch cli.tsx Inspect args. --version / --help β†’ dynamic-import only that handler, exit. Don't load React, telemetry, MCP.
1. Module-level I/O main.tsx Side-effect-fire MDM (security policy) + keychain subprocesses during import evaluation. ~138 ms of module loading runs in parallel with subprocess I/O.
2. Parse and trust init.ts Parse args, load config. Enforce a trust boundary dialog. Before: only safe ops (TLS, themes, telemetry). After: env vars and git commands.
3. Setup setup.ts Register everything in parallel: commands, agents, hooks, plugins, MCP. Hook config snapshot frozen here.
4. Launch replLauncher.ts Seven entry paths converge: REPL, print, SDK, resume, continue, pipe, headless. All call the same query() loop.

Other startup techniques

  • API preconnection β€” fire a HEAD to the Anthropic API during init. TCP+TLS handshake (100–200 ms) overlaps with setup. Connection is warm by the time the user submits.
  • Dynamic import for heavy libs β€” OpenTelemetry, provider SDKs, React for non-REPL paths.
  • 50+ profiling checkpoints sampled at 100% of internal users / 0.5% of external. Without instrumentation you can't tell what to optimize.

Search performance (270K+ paths)

Three layers:

  1. Bitmap pre-filter β€” assign each path a 26-bit mask of contained lowercase letters. Reject query: one integer comparison (charBits[i] & needleBitmap) !== needleBitmap. Rejects 10–90% at 4 bytes/entry.
  2. Score-bound rejection β€” skip paths that can't beat the current top score before expensive scoring.
  3. Async indexing with partial queryability β€” yield every ~4 ms. Search begins within 5–10 ms of index availability.

Rendering: patterns that transfer beyond the terminal

Claude Code forks Ink because stock Ink allocates one JS object per cell per frame β€” at 200Γ—120 that's 24,000 GC'd objects every 16 ms. Whatever you're rendering, the lessons transfer:

  • Double-buffer + atomic write. Two persistent Frame objects; render into the back, swap pointers (no allocation), write the diff in one syscall wrapped in BSU/ESU (Begin/End Synchronized Update). No tearing.
  • Cell-level diffing with damage rectangles. Compute the bounding box of writes; diff only inside it. ~6Γ— reduction in compare work for localized updates.
  • Three interning pools (chars, styles, hyperlinks) β†’ integer IDs everywhere. Style transitions become a single pre-cached string lookup. Pools generationally reset every 5 min.
  • Frame throttling. 60 fps focused, 30 fps blurred (throttle(deferredRender, FRAME_INTERVAL_MS)). Scroll events get a tighter 4 ms schedule.
  • Pack related data. Two Int32 words per cell beats scattered objects β€” better cache behavior, faster compare, fewer allocations.
  • Lazy expensive work. Syntax highlighting via React Suspense β€” code shows unstyled first, colors paint moments later.
  • Separate hot paths from React. Direct DOM mutation + microtask scheduling for scroll. React handles the final paint, where it's already efficient.

The thesis: performance is not making operations fast; it is eliminating operations entirely.


16. πŸ“‹ The 10 foundational patterns (cheat sheet)

# Pattern Why it matters
1 AsyncGenerator-based loops Natural backpressure, clean cancellation via .return(), typed terminal states
2 Speculative tool execution Run safe read-only tools while the model is still streaming β†’ noticeable latency cut
3 Concurrent-safe batching Partition by per-invocation safety; serial isolates side effects
4 Fork agents for cache sharing Byte-identical prefixes β‡’ ~95% input-token savings on children
5 4-layer context compression snip β†’ microcompact β†’ collapse β†’ autocompact, in that order
6 File-based memory + LLM recall Beats embeddings for negation and intent-aware retrieval; zero infra
7 Two-phase skill loading Frontmatter at startup, body on invocation
8 Sticky latches Cache-influencing flags become write-once for the session
9 Slot reservation 8K default output, 64K on demand β€” recovers 12–28% of context
10 Hook config snapshots Freeze at boot; defense against mid-session injection from a malicious repo

17. πŸ—ΊοΈ Build-your-own: a 14-step roadmap

A pragmatic order to implement these in. Each step compiles and runs on its own.

  1. Tool interface + factory. Define Tool<I, O, P>, buildTool() with safe defaults, and a ToolResult type. Ship one tool: Read. Test the Zod-based JSON Schema generation.
  2. Query loop v0. Async generator. No tools, no compression, just stream the model and yield messages. Return a Terminal discriminated union.
  3. Tool execution path. Add the 14-step pipeline as one function. Wire the loop to call it on tool_use blocks. Always pair tool_use with a tool_result, even on error.
  4. Permission modes + rules. Implement default, acceptEdits, plan, bypassPermissions. Add the resolution chain. Skip auto (LLM classifier) for now.
  5. Concurrency partition + executor. partitionToolCalls() + a serial/concurrent executor. Add isConcurrencySafe() to every tool. Yield results in submission order.
  6. Hook system v0. Two events: PreToolUse, PostToolUse. Command hooks only (shell process, exit codes). Capture a snapshot at startup.
  7. State split. Mutable singleton STATE for infra (cwd, model, session id). Tiny reactive store for UI (messages, approvals).
  8. Multi-provider client factory. Direct API first. Stub the others. buildFetch wrapper for client-request-id header.
  9. Prompt caching architecture. System-prompt boundary marker. Static prefix (cache scope: global if no MCP). Dynamic suffix per-session. Implement one sticky latch as proof.
  10. Compression v1: snip + microcompact. Skip collapse and autocompact for now. Wire the budget thresholds.
  11. Streaming tool executor. Watch the streaming SSE. Start safe tools when their tool_use is fully parsed. Buffer to preserve submission order.
  12. AgentTool + sub-agent lifecycle. Re-enter query() with isolated context. Implement the cleanup finally block. Skip fork agents.
  13. Memory. File layout, frontmatter contract, two-tier retrieval (index + LLM recall side-query). Four types only.
  14. Skills (two-phase) + slash commands. Frontmatter at startup; body at invocation; $ARGUMENTS substitution. Add EXTRA_DIRS resolution order.

Save for later (don't build until step 14 lands): fork agents, swarm teams, remote tasks, KAIROS continuous mode, auto-mode permission classifier, MCP transport layer, terminal renderer optimization, bitmap search index.


18. ⚠️ Anti-patterns and pitfalls

Loop / control flow

  • ❌ Callbacks or event emitters for the agent loop. You'll re-invent backpressure poorly. Use async function*.
  • ❌ A single error terminal state. Lose information. Encode 10+ specific reasons in a discriminated union.
  • ❌ Stop hooks on error responses. Creates error β†’ hook blocks β†’ retry β†’ error infinite loops. Skip them.
  • ❌ Forgetting to pair tool_use with tool_result on abort. API will reject the next message. Drain queued tools with synthetic results on every cancellation path.

Tools

  • ❌ A constructor literal instead of a factory. Defaults will be unsafe. Always go through buildTool().
  • ❌ Per-tool-type concurrency safety. Bash is sometimes safe, sometimes not. Pass parsed input.
  • ❌ Concatenating built-ins and MCP tools then sorting flat. Cache breakpoint dies. Sort within partition, then concat.
  • ❌ Returning huge raw output. Cap with maxResultSizeChars. Persist to disk + return preview.
  • ❌ Using the SDK's BetaMessageStream. O(nΒ²) JSON re-parsing. Read raw stream events.

Permissions

  • ❌ Scattering if mode === ... checks throughout tool code. Centralize in modes + the resolution chain.
  • ❌ Trusting a partial bash parse. If parseForSecurity() fails, treat the command as unsafe.
  • ❌ Sub-agent default = default mode. It needs a UI to prompt; bg agents have none. Default to bubble (sync) or dontAsk (async).

Caching / API

  • ❌ Runtime conditionals in the static prompt prefix. Each one doubles cache key space. Move below the boundary.
  • ❌ Mid-session feature toggles that change request headers. Use sticky latches.
  • ❌ Reserving 64K output tokens by default. Over-reserve 8–16Γ—. Cap at 8K, escalate on demand.
  • ❌ Regenerating the system prompt for fork children. Feature flags or session date may have moved. Pass parent's bytes.
  • ❌ Filtering tools per child agent in fork mode. Different array β†’ different cache key. Use useExactTools: true and runtime guards.

Memory

  • ❌ Storing what git log can answer. Code patterns, fix recipes, who-changed-what. Useless duplication that goes stale.
  • ❌ Embedding-only retrieval. Misses negation ("do NOT mock the DB"). Use LLM recall over a manifest.
  • ❌ Hard expiration. Annotate with age; let the model decide. Stale memories are still data.
  • ❌ Letting MEMORY.md grow past 200 lines. Truncated silently. Treat the index as a budget.

Multi-agent

  • ❌ Coordinators with the full tool set. They'll do the work themselves. Restrict to Agent, SendMessage, TaskStop.
  • ❌ Workers asked to "based on the research, implement X." They re-derive context, miss specifics, hallucinate paths. Synthesis is the coordinator's job.
  • ❌ Mid-tool-execution message delivery. Race conditions. Queue at tool-round boundaries.
  • ❌ Unbounded teammate state. 36.8 GB / 292 agents was a real production incident. Cap message history.
  • ❌ General-purpose agents that can spawn Agent. Exponential fan-out. Block recursive spawning at the schema level.

Bootstrap / hooks

  • ❌ Loading the world for --version. Fast-path dispatch first, full bootstrap second.
  • ❌ Hook config that updates live mid-session. Lets a malicious repo redefine permissions after trust dialog. Snapshot at startup; update only via explicit user channel.
  • ❌ Treating MCP skills like local skills. They are content-only. Never execute their inline shell commands.

🎯 Closing thought

The deepest principle in the source book is repeated at every layer: push complexity to the boundaries. Permission resolution, protocol translation, state reconciliation, tool I/O β€” these are the messy edges. Concentrate the mess there. Keep the loop, the tool composition, the memory recall, and the streaming logic clean and exhaustively typed.

If you remember nothing else: most of this system is generators yielding strongly-typed events through a series of small modules, with a few critical caches and a few critical safety doors. Build it in that order.


19. πŸ“– Glossary

Quick reference for the jargon used throughout this guide.

Term Meaning
AsyncGenerator A JS function declared async function*. Yields values lazily, pauses at each yield until consumer calls .next(). Provides backpressure and clean cancellation.
Backpressure The producer pauses when the consumer can't keep up. Generators give it for free; event emitters do not.
Cache breakpoint The byte position in the prompt where the prompt cache stops matching. Move volatile content after the breakpoint to maximize hit rate.
Concurrency-safe A tool invocation that can run in parallel with others without observable side effects. Determined per-input, not per-tool-type.
Context window The token budget for a single API call (prompt + output). When you exceed it the API rejects the request.
Discriminated union A type made of variants tagged by a literal field (`{ kind: 'completed' } \
Fork agent A sub-agent that inherits the parent's byte-identical prompt prefix to maximize prompt-cache hits (~95% input-token discount on children 2…N).
Frontmatter The YAML block at the top of a {% raw %}.md file (between two --- lines). Used for skill/agent/memory metadata.
Hook A user/plugin/policy interceptor at one of 27 lifecycle events. Can block, modify, or inject.
MCP Model Context Protocol β€” the JSON-RPC standard for connecting external tool servers to an agent. Eight transports.
Microcompact Layer 2 of context compression. Removes tool results by tool_use_id when no longer needed.
Prompt cache Anthropic's server-side cache of prompt prefixes. ~90% discount on cached input tokens. Entire architecture revolves around preserving hits.
Reservoir sampling Algorithm R. Maintain a fixed-size random sample of an unbounded stream. Used here for latency histograms (1,024 entries β†’ accurate p50/p95/p99).
Slot reservation The max_tokens value sent to the API. Default cap 8K, escalate to 64K on truncation (<1% of requests). Reclaims 12–28% of context.
Speculative execution Starting tools while the model is still streaming, before the assistant message completes. Saves hundreds of ms when read-only tools dominate.
Sticky latch A write-once boolean (`null \
Sub-agent A child agent spawned via {% raw %}AgentTool. New query() generator with isolated message history. Sync (parent waits) or async (background).
Synthetic tool result A fabricated tool_result block emitted on cancellation so the API doesn't see a tool_use without a matching result.
Terminal state The discriminated-union value the agent loop returns (vs. yields). Encodes why execution stopped β€” 10 distinct reasons.
tool_use / tool_result Anthropic API blocks. Every tool_use in an assistant message must be paired with a tool_result in the next user message. The single most common bug source.
Two-phase skill loading Frontmatter loaded into the system prompt at startup; full body loaded only on invocation. Lets you ship 50+ skills cheaply.

Sources

  • Repo: https://github.com/alejandrobalderas/claude-code-from-source (raw chapter markdown β€” primary source)
  • Companion site: https://claude-code-from-source.com (live, returns HTTP 200; WebFetch was Cloudflare-blocked, content retrieved via direct curl + WebSearch index)
  • Chapters analyzed: 1 (Architecture), 2 (Bootstrap), 3 (State), 4 (API Layer), 5 (Agent Loop), 6 (Tools), 7 (Concurrency), 8 (Sub-Agents), 9 (Fork Agents), 10 (Coordination), 11 (Memory), 12 (Extensibility), 13 (Terminal UI), 15 (MCP), 17 (Performance), 18 (Epilogue).

The source repo is purely educational and contains no source code from Claude Code β€” only original pseudocode derived from npm source maps. This guide follows the same convention.


If you found this helpful, let me know by leaving a πŸ‘ or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! πŸ˜ƒ

Top comments (0)