김이더

Posted on Mar 31

I Tore Apart the Claude Code Source Code 2/3

#ai #claudecode #sourceanalysis #systemprompt

How the System Prompt Is Assembled

constants/prompts.ts has a getSystemPrompt() function. It returns not a string but a string array. Each element is one section of the system prompt.

return [
  // --- Static content (cacheable) ---
  getSimpleIntroSection(outputStyleConfig),
  getSimpleSystemSection(),
  getSimpleDoingTasksSection(),
  getActionsSection(),
  getUsingYourToolsSection(enabledTools),
  getSimpleToneAndStyleSection(),
  getOutputEfficiencySection(),

  // === BOUNDARY ===
  SYSTEM_PROMPT_DYNAMIC_BOUNDARY,

  // --- Dynamic content (changes per turn) ---
  ...resolvedDynamicSections,
]

Everything above the boundary is identical for all users. "You are Claude Code," "use tools this way," "keep output concise." Below the boundary is user-specific. CLAUDE.md contents, MCP server configs, language settings.

Why this boundary matters is next.

The Obsession With Not Breaking Prompt Cache

This is the most impressive engineering I found. Anthropic puts extraordinary effort into prompt cache cost reduction.

The SYSTEM_PROMPT_DYNAMIC_BOUNDARY constant splits the system prompt array into "globally cacheable" and "dynamic content."

export const SYSTEM_PROMPT_DYNAMIC_BOUNDARY =
  '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'

Static content above the boundary is cached with scope: 'global'. All users share the same cache. Below is user-specific and can't be shared.

This matters because cache_creation tokens are expensive and cache_read tokens are cheap on the Anthropic API. If the static part changes, cache breaks, and expensive cache_creation tokens are incurred.

There's an entire dedicated module for detecting cache breaks — promptCacheBreakDetection.ts. Every turn, it compares hashes of the system prompt, tool schemas, model, beta headers, and more.

type PreviousState = {
  systemHash: number
  toolsHash: number
  cacheControlHash: number
  perToolHashes: Record<string, number>
  model: string
  fastMode: boolean
  // ...and more
}

A source code comment reveals the scale of this concern.

// The dynamic agent list was ~10.2% of fleet cache_creation tokens

The agent list changing dynamically accounted for 10.2% of all cache creation tokens fleet-wide. So they moved the agent list from the system prompt into conversation message attachments. System prompt doesn't change, cache doesn't break.

Same pattern for dates. The system prompt includes today's date, but if midnight rolls over, the date changes and the cache breaks. So the date is memoized at session start, and midnight date changes are handled via tail-end attachment messages that don't bust the cache.

// Memoized for prompt-cache stability
export const getSessionStartDate = memoize(getLocalISODate)

This level of optimization reminds me of game server hot-path optimization. "10.2% was too much, so we redesigned the architecture." That's how seriously Anthropic takes cost efficiency.

Output Rules Are Different for Anthropic Employees

The getOutputEfficiencySection() function returns completely different prompts based on whether USER_TYPE is ant (Anthropic employee) or not.

Regular users get a short instruction.

Keep your text output brief and direct. Lead with the answer or action,
not the reasoning.

Anthropic employees get detailed writing guidelines. "Assume the person stepped away and lost the thread," "don't use codenames or abbreviations," "use inverted pyramid structure."

Even more surprising — there are numeric length limits in the Ant build only.

// research shows ~1.2% output token reduction vs qualitative "be concise"
'Length limits: keep text between tool calls to ≤25 words.
 Keep final responses to ≤100 words unless the task requires more detail.'

"≤25 words" reduces output tokens by 1.2% compared to just saying "be concise." Data-driven prompt tuning. Testing internally first, likely rolling out to everyone if quality holds up.

Coordinator Mode — AI Directing AI

Set CLAUDE_CODE_COORDINATOR_MODE=1 and Claude Code's behavior changes completely. Instead of writing code directly, Claude becomes an orchestrator that only directs worker agents.

The coordinator's full system prompt is in the source. The key identity statement is clear.

You are a coordinator. Your job is to:
- Help the user achieve their goal
- Direct workers to research, implement and verify code changes
- Synthesize results and communicate with the user

The coordinator only gets 3 tools. Agent (spawn workers), SendMessage (message workers), TaskStop (stop workers). No file read. No file edit. No bash.

The workflow follows defined phases. Research → Synthesis → Implementation → Verification. Workers execute each phase. The coordinator synthesizes results.

The most emphasized rule is "Never delegate understanding."

Never write "based on your findings, fix the bug" or
"based on the research, implement it."

The coordinator must understand worker research results, then write specific specs with file paths, line numbers, and exact changes to hand off. "Fix the bug" is forbidden. "Fix the null pointer in src/auth/validate.ts:42, the user field is undefined when Session.expired is true but the token remains cached" is required.

Same principle as directing junior developers. Vague instructions produce vague results.

Fork Subagent — Cloning Itself

The agent system has a fork capability. Regular subagents start with zero context. A fork inherits the parent's full conversation context.

Fork yourself when the intermediate tool output isn't worth keeping
in your context.

Forks are efficient because they share the parent's prompt cache. Same system prompt means cache hits. That's why you shouldn't set a model parameter on forks — a different model can't use the parent's cache.

Two critical rules for forks.

"Don't peek." Don't Read the fork's output file while it's working. The fork's tool call noise entering your context defeats the purpose.

"Don't race." If the fork isn't done, don't guess the results. Don't fabricate fork output in any form.

This is multithreaded programming principles applied to AI agents. Don't read another thread's state before it's done. Race condition. Something I deal with daily in game server code.

Worktree Isolation

Agents can be given isolation: "worktree". This creates a git worktree so the agent works in an isolated repository copy.

Creates a new git worktree inside `.claude/worktrees/`
with a new branch based on HEAD

If the agent makes changes, the worktree and branch persist. No changes means auto-cleanup. At session end, you're asked whether to keep or remove the worktree.

Powerful in practice. You can send an agent to do experimental refactoring without touching your main branch. Like the result? Merge. Don't? Delete.

Hooks — Custom Scripts on Tool Events

The Hooks system binds shell commands to tool events. Before execution, after execution, or on specific events like prompt submission and notifications.

Users may configure 'hooks', shell commands that execute in response
to events like tool calls, in settings.

If a hook fails, Claude treats the failure as user feedback and adjusts its approach. You could build a pipeline that auto-lints on every file edit, runs tests before commits, or alerts when specific code patterns are detected.

Session-end hooks are also supported, with configurable timeout via CLAUDE_CODE_SESSIONEND_HOOKS_TIMEOUT_MS.

Next — The Engineering Details That Blew My Mind

Part 3 covers the wildest stuff. A gacha-style virtual pet called "Buddy," self-updating "Magic Docs," an "Away Summary" that recaps when you return, an "Undercover Mode" for Anthropic employees contributing to open source, and a built-in cron scheduler.

"The agent list accounted for 10.2% of fleet cache creation tokens. So they redesigned the architecture."

DEV Community