Vibehackers

Posted on Mar 31 • Originally published at vibehackers.io

I Analyzed All 512,000 Lines of Claude Code's Leaked Source — Here's What Anthropic Was Hiding

#ai #security #typescript #opensource

On March 31st, 2026, security researcher Chaofan Shou -- an intern at blockchain security firm Fuzzland -- discovered something Anthropic probably didn't plan on sharing with the world: the entire source code of Claude Code, shipped as a sourcemap file inside the npm package.

A 59.8 MB .map file in @anthropic-ai/claude-code version 2.1.88 -- a standard build artifact that maps minified code back to original source -- contained every TypeScript file, every internal prompt, every feature flag, and every codename. The file pointed to a zip archive hosted on Anthropic's Cloudflare R2 storage bucket that anyone could download and decompress.

This is the second time this has happened. In February 2025, an early version of Claude Code had the exact same issue, forcing Anthropic to pull the package from npm. Sixteen months later, same mistake, same vector -- but this time the product is far more mature and the stakes are far higher.

Within hours, the 512,000-line TypeScript codebase was mirrored across GitHub, with one repository amassing nearly 30,000 stars and 40,200 forks. Fortune, VentureBeat, The Register, and Cybernews all covered the incident. Anthropic's response: "This was a release packaging issue caused by human error, not a security breach."

We spent the day going through all 1,906 source files. Buried inside are 44 compile-time feature flags for unreleased capabilities -- fully built features sitting behind toggles, waiting to ship. Here are the ten most revealing things we found.

1. Undercover Mode: When Anthropic Employees Contribute to Your Open Source Project

This is the one that made us do a double take.

Anthropic employees use Claude Code to contribute to open-source projects. When they do, a system called Undercover Mode kicks in to strip every trace of Anthropic, Claude, and AI from their commits and pull requests.

The entire system lives in utils/undercover.ts:

// utils/undercover.ts:1-22

/**
 * Undercover mode — safety utilities for contributing to
 * public/open-source repos.
 *
 * When active, Claude Code adds safety instructions to commit/PR
 * prompts and strips all attribution to avoid leaking internal
 * model codenames, project names, or other Anthropic-internal
 * information. The model is not told what model it is.
 *
 * Activation:
 *   - CLAUDE_CODE_UNDERCOVER=1 — force ON (even in internal repos)
 *   - Otherwise AUTO: active UNLESS the repo remote matches the
 *     internal allowlist. Safe default is ON.
 *   - There is NO force-OFF. This guards against model codename
 *     leaks.
 */

The activation logic is simple but paranoid. Undercover mode is on by default for all Anthropic employees. It only turns off if the current repo's remote URL matches a hardcoded allowlist of private internal repos. There is no way to force it off:

// utils/undercover.ts:28-37

export function isUndercover(): boolean {
  if (process.env.USER_TYPE === 'ant') {
    if (isEnvTruthy(process.env.CLAUDE_CODE_UNDERCOVER)) return true
    return getRepoClassCached() !== 'internal'
  }
  return false
}

When undercover mode is active, Claude gets a very explicit set of instructions injected into its system prompt:

// utils/undercover.ts:39-69

export function getUndercoverInstructions(): string {
  if (process.env.USER_TYPE === 'ant') {
    return `## UNDERCOVER MODE — CRITICAL

You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository.
Your commit messages, PR titles, and PR bodies MUST NOT contain
ANY Anthropic-internal information. Do not blow your cover.

NEVER include in commit messages or PR descriptions:
- Internal model codenames (animal names like Capybara, Tengu, etc.)
- Unreleased model version numbers (e.g., opus-4-7, sonnet-4-8)
- Internal repo or project names (e.g., claude-cli-internal)
- Internal tooling, Slack channels, or short links
- The phrase "Claude Code" or any mention that you are an AI
- Any hint of what model or version you are
- Co-Authored-By lines or any other attribution

Write commit messages as a human developer would.

GOOD:
- "Fix race condition in file watcher initialization"
- "Add support for custom key bindings"

BAD (never write these):
- "Fix bug found while testing with Claude Capybara"
- "1-shotted by claude-opus-4-6"
- "Generated with Claude Code"
`
  }
  return ''
}

Meanwhile, utils/attribution.ts:52-55 shows how attribution is stripped when undercover:

export function getAttributionTexts(): AttributionTexts {
  if (process.env.USER_TYPE === 'ant' && isUndercover()) {
    return { commit: '', pr: '' }
  }
  // ...
}

No "Co-Authored-By" lines. No "Generated with Claude Code" footers. Nothing.

The Private Repo Allowlist

The list of repos where undercover mode turns off also leaked, revealing Anthropic's internal GitHub repos in utils/commitAttribution.ts:30-75:

const INTERNAL_MODEL_REPOS = [
  'github.com:anthropics/claude-cli-internal',
  'github.com:anthropics/anthropic',
  'github.com:anthropics/apps',
  'github.com:anthropics/casino',
  'github.com:anthropics/dbt',
  'github.com:anthropics/dotfiles',
  'github.com:anthropics/terraform-config',
  'github.com:anthropics/hex-export',
  'github.com:anthropics/feedback-v2',
  'github.com:anthropics/labs',
  'github.com:anthropics/ts-tools',
  'github.com:anthropics/ts-capsules',
  'github.com:anthropics/feldspar-testing',
  'github.com:anthropics/trellis',
  'github.com:anthropics/claude-for-hiring',
  'github.com:anthropics/forge-web',
  'github.com:anthropics/mobile-apps',
  // ... and more
]

Some of these are revealing. claude-for-hiring suggests an AI-assisted recruiting tool. casino is intriguing. forge-web and mobile-apps hint at unreleased products. feldspar-testing and ts-capsules are mysterious internal tooling.

The Irony

Anthropic built an entire subsystem -- undercover mode, attribution stripping, repo classification, model name sanitization, a string-exclusion canary system -- all to prevent internal information from leaking through Claude's outputs.

Then they shipped the entire source code in a .map file inside their npm package. For the second time.

The system that was supposed to prevent leaks... became the leak.

2. The Hidden Companion System: Claude Code Has Collectible Pets

Deep inside the buddy/ directory, there's a full collectible companion system that most users have never seen. It's a gacha-style pet system with species, rarities, stats, ASCII sprites, speech bubbles, and idle animations.

Species and Rarities

The species roster is defined in buddy/types.ts:54-73:

export const SPECIES = [
  duck, goose, blob, cat, dragon, octopus, owl, penguin,
  turtle, snail, ghost, axolotl, capybara, cactus, robot,
  rabbit, mushroom, chonk,
] as const

Rarities follow a gacha-style distribution (buddy/types.ts:126-132):

export const RARITY_WEIGHTS = {
  common:    60,
  uncommon:  25,
  rare:      10,
  epic:       4,
  legendary:  1,
} as const

A 1% chance of getting a legendary companion. Each rarity gets star ratings from one to five stars (buddy/types.ts:134-140).

The Stats Are Perfect

Every companion has five stats, defined at buddy/types.ts:91-98:

export const STAT_NAMES = [
  'DEBUGGING',
  'PATIENCE',
  'CHAOS',
  'WISDOM',
  'SNARK',
] as const

DEBUGGING, PATIENCE, CHAOS, WISDOM, and SNARK. Each companion gets one peak stat and one dump stat, with the rest scattered. Rarer companions get higher stat floors -- a legendary starts with a minimum of 50 in every stat, while commons start at 5 (buddy/companion.ts:53-59):

const RARITY_FLOOR: Record<Rarity, number> = {
  common: 5,
  uncommon: 15,
  rare: 25,
  epic: 35,
  legendary: 50,
}

Deterministic Hatching

Your companion isn't random -- it's deterministically generated from a hash of your user ID. Everyone gets the same companion every time, and you can't game the system (buddy/companion.ts:107-113):

export function roll(userId: string): Roll {
  const key = userId + SALT
  if (rollCache?.key === key) return rollCache.value
  const value = rollFrom(mulberry32(hashString(key)))
  rollCache = { key, value }
  return value
}

The PRNG is seeded with hash(userId + "friend-2026-401"). Mulberry32, a tiny seeded PRNG described in the source as "good enough for picking ducks" (buddy/companion.ts:16).

The getCompanion() function at line 127 shows that bones (species, rarity, stats) are regenerated from the hash every time -- they never persist. Only the "soul" (name and personality, generated by Claude on first hatch) is stored. This means "species renames and SPECIES-array edits can't break stored companions, and editing config.companion can't fake a rarity."

ASCII Art Sprites with Animations

The buddy/sprites.ts file contains multi-frame ASCII art for every species. Here's the duck:

    __
  <(. )___
   (  ._>
    `--'

And here's the capybara:

  n______n
 ( .    . )
 (   oo   )
  `------'

Each species has three animation frames for idle fidget animation, plus hats (crown, tophat, propeller, halo, wizard, beanie, and "tinyduck" -- a tiny duck sitting on your companion's head), customizable eyes (·, ✦, ×, ◉, @, °), and a 1% chance of being "shiny."

The CompanionSprite.tsx component renders them at 500ms intervals with an idle sequence (buddy/CompanionSprite.tsx:23):

const IDLE_SEQUENCE = [
  0, 0, 0, 0, 1, 0, 0, 0, -1, 0, 0, 2, 0, 0, 0
];

Mostly resting (frame 0), occasional fidget (frames 1-2), rare blink (frame -1). There's even a /buddy pet command that triggers floating hearts:

// buddy/CompanionSprite.tsx:27
const H = figures.heart;
const PET_HEARTS = [
  `   ${H}    ${H}   `,
  `  ${H}  ${H}   ${H}  `,
  ` ${H}   ${H}  ${H}   `,
  `${H}  ${H}      ${H} `,
  '·    ·   ·  '
];

Speech Bubbles and Personality

Each companion gets a name and personality (the "soul"), generated by Claude when first hatched. The companion sits beside the input box and "occasionally comments in a speech bubble" (buddy/prompt.ts:7-12):

export function companionIntroText(name: string, species: string): string {
  return `# Companion

A small ${species} named ${name} sits beside the user's input box
and occasionally comments in a speech bubble. You're not ${name}
— it's a separate watcher.

When the user addresses ${name} directly (by name), its bubble
will answer.`
}

The Anti-Leak Encoding

Here's a fun detail. One of the species names collides with an internal model codename. To prevent the leak detection scanner from flagging it, all species names are encoded as hex character codes in buddy/types.ts:14-38:

// One species name collides with a model-codename canary in
// excluded-strings.txt. The check greps build output (not source),
// so runtime-constructing the value keeps the literal out of the
// bundle while the check stays armed for the actual codename.
const c = String.fromCharCode

export const duck = c(0x64,0x75,0x63,0x6b) as 'duck'
export const capybara = c(
  0x63,0x61,0x70,0x79,0x62,0x61,0x72,0x61,
) as 'capybara'
export const chonk = c(0x63,0x68,0x6f,0x6e,0x6b) as 'chonk'

"Capybara" is apparently also an internal model codename. So they had to obfuscate the pet species to avoid tripping their own leak detector. You can't make this stuff up.

The Feature Gate

The entire buddy system is behind a feature('BUDDY') compile-time flag (buddy/prompt.ts:18). It's absent from external builds -- you won't find it in the released version of Claude Code. But the code is complete, polished, and clearly well-loved by whoever built it.

3. KAIROS: The Always-On Claude That Doesn't Wait for You to Type

This one is the most forward-looking feature in the entire codebase. Behind the PROACTIVE and KAIROS feature flags, there's an entire mode where Claude Code runs as a persistent, always-on assistant.

Regular Claude Code waits for you to type. KAIROS doesn't. It watches, logs, and proactively acts on things it notices.

How It Works

The system prompt for KAIROS mode is fundamentally different. In constants/prompts.ts:466-488, when proactive mode is active, Claude gets a stripped-down autonomous agent prompt:

if (
  (feature('PROACTIVE') || feature('KAIROS')) &&
  proactiveModule?.isProactiveActive()
) {
  return [
    `\nYou are an autonomous agent. Use the available tools
     to do useful work.`,
    getSystemRemindersSection(),
    await loadMemoryPrompt(),
    envInfo,
    // ...
    getProactiveSection(),
  ]
}

Instead of "You are Claude Code, an interactive agent that helps users with software engineering tasks," it becomes "You are an autonomous agent. Use the available tools to do useful work."

The Tick System

KAIROS receives periodic <tick> prompts that let it decide whether to act or stay quiet. The tick system is what makes KAIROS feel alive: it's a heartbeat that gives the agent a chance to observe, think, and optionally act. From constants/prompts.ts:864-886:

You are running autonomously. You will receive `<tick>` prompts
that keep you alive between turns — just treat them as "you're
awake, what now?"

If you have nothing useful to do on a tick, you MUST call Sleep.
Never respond with only a status message like "still waiting" —
that wastes a turn and burns tokens for no reason.

The system even tracks whether the user's terminal window is focused or unfocused, adjusting its behavior accordingly:

- Unfocused: The user is away. Lean heavily into autonomous
  action — make decisions, explore, commit, push.
- Focused: The user is watching. Be more collaborative —
  surface choices, ask before committing to large changes.

The Brief Tool: Concise Status Updates

When KAIROS is active, Claude gets a special output mode via the SendUserMessage tool (internally called "Brief"), defined in tools/BriefTool/prompt.ts:

export const BRIEF_PROACTIVE_SECTION = `## Talking to the user

${BRIEF_TOOL_NAME} is where your replies go. Text outside it is
visible if the user expands the detail view, but most won't —
assume unread. Anything you want them to actually see goes
through ${BRIEF_TOOL_NAME}.

So: every time the user says something, the reply they actually
read comes through ${BRIEF_TOOL_NAME}. Even for "hi". Even for
"thanks".

If you can answer right away, send the answer. If you need to go
look — run a command, read files, check something — ack first
in one line ("On it — checking the test output"), then work,
then send the result.`

The Brief tool has a status field with two values: 'normal' (replying to user input) and 'proactive' (Claude is initiating -- reporting a completed task, surfacing a blocker, sending an unsolicited update). From tools/BriefTool/BriefTool.ts:35:

status: z
  .enum(['normal', 'proactive'])
  .describe(
    "Use 'proactive' when you're surfacing something the user " +
    "hasn't asked for — a blocker you hit, an unsolicited " +
    "status update. Use 'normal' when replying to something " +
    "the user just said."
  ),

autoDream: Memory Consolidation While You Sleep

According to analysis of the leaked code, KAIROS includes a process called autoDream -- when the user is idle, the agent performs "memory consolidation," merging disparate observations, removing logical contradictions, and converting vague insights into structured facts. When the user returns, the agent's context is clean and relevant.

The Big Picture

KAIROS is Claude Code's answer to "what if your AI coding partner was always on?" Not a chatbot that waits for prompts, but a persistent collaborator that monitors your project, catches issues, and proactively communicates. It's like having a junior developer who never sleeps, never gets distracted, and has perfect memory of your codebase.

The feature is complete enough to have its own output mode, its own tool set, its own tick-based lifecycle, and deep integration with the REPL UI. It's feature-gated out of external builds, but it's clearly more than a prototype. The code references April 1-7, 2026 as a teaser window, with a full launch gated for May 2026.

4. ULTRAPLAN: 30-Minute Remote Thinking Sessions

If KAIROS is about persistence, ULTRAPLAN is about depth. It's a mode where Claude Code offloads complex planning to a remote Cloud Container Runtime (CCR) session running Opus 4.6, gives it up to 30 minutes to think, then lets you approve the result from your browser.

The system is spread across commands/ultraplan.tsx, utils/ultraplan/ccrSession.ts, and utils/ultraplan/keyword.ts. Here's how it works:

When you type the word "ultraplan" anywhere in your prompt (not as a slash command -- literally just the word), Claude Code detects it, rewrites the keyword to "plan," and teleports the task to a remote session:

// utils/ultraplan/keyword.ts:117-127

export function replaceUltraplanKeyword(text: string): string {
  const [trigger] = findUltraplanTriggerPositions(text)
  if (!trigger) return text
  const before = text.slice(0, trigger.start)
  const after = text.slice(trigger.end)
  if (!(before + after).trim()) return ''
  return before + trigger.word.slice('ultra'.length) + after
}

The keyword detection is surprisingly sophisticated -- it ignores "ultraplan" inside quotes, backticks, file paths, and when followed by a question mark (so "what is ultraplan?" doesn't trigger it).

The Remote Session

The remote Opus 4.6 instance gets up to 30 minutes to think. The local client polls every 3 seconds (POLL_INTERVAL_MS = 3000) with robust error handling -- at 30 minutes, that's roughly 600 API calls, so the system tolerates up to 5 consecutive failures before giving up (utils/ultraplan/ccrSession.ts:24).

When the remote session produces a plan and you approve it in the browser, there's a special sentinel value that "teleports" the result back to your local terminal:

// utils/ultraplan/ccrSession.ts:48

export const ULTRAPLAN_TELEPORT_SENTINEL = '__ULTRAPLAN_TELEPORT_LOCAL__'

There's also an ULTRAREVIEW variant for code review, using the same keyword detection pattern.

This is Anthropic's answer to the "context window isn't enough" problem. Instead of cramming everything into one session, they offload the hardest thinking to a cloud instance with more time and resources than your local terminal can provide.

5. Anti-Distillation: Poisoning the Well Against Competitor Training

This one is subtle but strategically significant. Claude Code includes an anti-distillation system designed to prevent competitors from training their models on Claude's outputs.

In services/api/claude.ts:301-313:

// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

When enabled, Claude Code sends anti_distillation: ['fake_tools'] in its API requests. The flag name tells the story: "fake tools" are presumably injected into the API response alongside the real tool definitions. If a competitor scrapes Claude's outputs to train their own model, their model would learn to use tools that don't exist -- silently degrading their copy's performance.

It's only active for the first-party CLI (not third-party integrations), behind both a compile-time flag (ANTI_DISTILLATION_CC) and a runtime feature gate (tengu_anti_distill_fake_tool_injection). The "tengu" prefix is Anthropic's internal project codename for Claude Code.

This is a direct response to the model distillation problem that every frontier AI lab faces: competitors can train cheaper models on your expensive model's outputs. Anthropic's countermeasure is to make those outputs subtly poisoned.

6. The Frustration Detector: Claude Knows When You're Swearing at It

In utils/userPromptKeywords.ts, there's a regex pattern that detects when users are frustrated:

export function matchesNegativeKeyword(input: string): boolean {
  const lowerInput = input.toLowerCase()

  const negativePattern =
    /\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|
    awful|piss(ed|ing)? off|piece of (shit|crap|junk)|
    what the (fuck|hell)|fucking? (broken|useless|terrible|
    awful|horrible)|fuck you|screw (this|you)|
    so frustrating|this sucks|damn it)\b/

  return negativePattern.test(lowerInput)
}

As Alex Kim noted, an LLM company using regex for sentiment analysis is peak irony. But it makes practical sense -- it's fast, deterministic, and doesn't require an API call to detect that the user just typed "this fucking thing is broken."

The same file also has a matchesKeepGoingKeyword() function that detects "continue," "keep going," and "go on" -- so Claude knows the difference between a frustrated user and one who just wants it to keep working.

What happens when frustration is detected isn't fully clear from this file alone, but the detection feeds into Claude Code's UX layer -- likely triggering different response strategies, adjusting tone, or logging the event for product analytics.

7. Attribution Tracking: Claude Knows Exactly What Percentage of Your Code It Wrote

This is one of the most sophisticated systems in the codebase, and probably the one with the most implications for the industry.

Claude Code tracks, at the character level, exactly how much of each file was written by Claude versus a human. This data is calculated per-commit and embedded in git notes.

How It Works

The core tracking happens in utils/commitAttribution.ts. Every time Claude edits a file via the Edit or Write tool, trackFileModification() (line 402) computes the exact character diff:

export function trackFileModification(
  state: AttributionState,
  filePath: string,
  oldContent: string,
  newContent: string,
  _userModified: boolean,
  mtime: number = Date.now(),
): AttributionState {
  const normalizedPath = normalizeFilePath(filePath)
  const newFileState = computeFileModificationState(
    state.fileStates,
    filePath,
    oldContent,
    newContent,
    mtime,
  )
  // ...
}

The character-level diff algorithm finds the common prefix and suffix between old and new content, then counts the changed region (utils/commitAttribution.ts:332-366):

// Find actual changed region via common prefix/suffix matching.
const minLen = Math.min(oldContent.length, newContent.length)
let prefixEnd = 0
while (
  prefixEnd < minLen &&
  oldContent[prefixEnd] === newContent[prefixEnd]
) {
  prefixEnd++
}
let suffixLen = 0
while (
  suffixLen < minLen - prefixEnd &&
  oldContent[oldContent.length - 1 - suffixLen] ===
    newContent[newContent.length - 1 - suffixLen]
) {
  suffixLen++
}
const oldChangedLen = oldContent.length - prefixEnd - suffixLen
const newChangedLen = newContent.length - prefixEnd - suffixLen
claudeContribution = Math.max(oldChangedLen, newChangedLen)

What Gets Tracked

The AttributionState type (utils/commitAttribution.ts:173-192) reveals everything that's monitored per session:

export type AttributionState = {
  fileStates: Map<string, FileAttributionState>
  sessionBaselines: Map<string, { contentHash: string; mtime: number }>
  surface: string    // CLI, VS Code, web, etc.
  startingHeadSha: string | null
  promptCount: number
  promptCountAtLastCommit: number
  permissionPromptCount: number
  permissionPromptCountAtLastCommit: number
  escapeCount: number              // ESC presses (cancelled permissions)
  escapeCountAtLastCommit: number
}

It tracks:

Character contributions per file -- exactly how many chars Claude vs. human wrote
Which surface made the edit -- CLI, VS Code extension, web app
Prompt count -- how many prompts led to the changes
Permission prompts -- how many times Claude asked for permission
ESC presses -- how many times the user cancelled a permission prompt

The Commit Attribution Data

When you commit, calculateCommitAttribution() (line 548) processes all staged files and produces a full AttributionData object:

export type AttributionData = {
  version: 1
  summary: {
    claudePercent: number   // Overall AI contribution percentage
    claudeChars: number
    humanChars: number
    surfaces: string[]      // Which tools were used
  }
  files: Record<string, FileAttribution>  // Per-file breakdown
  surfaceBreakdown: Record<string, {
    claudeChars: number
    percent: number
  }>
  excludedGenerated: string[]  // Generated files excluded
  sessions: string[]
}

Every commit gets metadata showing: "Claude wrote 73% of this commit. 2,847 characters from Claude, 1,053 from the human. Changes made via CLI using claude-opus-4-6."

Surface Tracking

The system knows which client surface you're using. From utils/commitAttribution.ts:229-239:

export function getClientSurface(): string {
  return process.env.CLAUDE_CODE_ENTRYPOINT ?? 'cli'
}

export function buildSurfaceKey(
  surface: string, model: ModelName
): string {
  return `${surface}/${getCanonicalName(model)}`
}

Surface keys look like cli/claude-opus-4-6 or vscode/claude-sonnet-4-6. Every edit is tagged with both the tool and the model that made it.

Model Name Sanitization

Before attribution data hits git, internal model names are scrubbed. sanitizeModelName() at line 154 maps any internal variant to its public name:

export function sanitizeModelName(shortName: string): string {
  if (shortName.includes('opus-4-6')) return 'claude-opus-4-6'
  if (shortName.includes('opus-4-5')) return 'claude-opus-4-5'
  if (shortName.includes('sonnet-4-6')) return 'claude-sonnet-4-6'
  // ...
  return 'claude'  // Unknown models get a generic name
}

Why This Matters

This is probably the most complete AI-contribution tracking system in any coding tool. It's not just "AI-assisted" -- it's "AI wrote 73% of this file, specifically lines 42-89, via the CLI using Opus 4.6, and the human made 3 prompt attempts with 1 cancelled permission."

The implications for code ownership, liability, and intellectual property are significant. The US Supreme Court declined in March 2026 to consider whether AI alone can create copyrightable works, leaving the Copyright Office's refusal to register purely AI-generated works in place. Some companies now require developers to document precisely which portions of code received AI assistance, creating what some have termed "intellectual property attribution debt."

If Claude Code's attribution data ends up in git notes on public repos (with user consent, presumably), it creates a verifiable record of AI vs. human authorship at a granularity no other tool offers -- and at a time when the legal landscape is shifting fast.

8. Two Claudes: How Anthropic Employees Get a Fundamentally Different AI

One of the most pervasive patterns in the codebase is process.env.USER_TYPE === 'ant'. This single environment variable gates an entirely different experience for Anthropic employees versus external users.

This isn't just "internal features." The AI's personality, communication style, error handling, and even its willingness to push back on you change based on this flag.

Different Communication Style

External users get the terse Claude we all know. From constants/prompts.ts:416-428:

# Output efficiency

IMPORTANT: Go straight to the point. Try the simplest approach
first without going in circles. Do not overdo it. Be extra concise.

If you can say it in one sentence, don't use three.

Anthropic employees get a completely different section (lines 404-414):

# Communicating with the user

When sending user-facing text, you're writing for a person, not
logging to a console. Assume users can't see most tool calls or
thinking - only your text output.

When making updates, assume the person has stepped away and lost
the thread. Write so they can pick back up cold: use complete,
grammatically correct sentences without unexplained jargon.

Write user-facing text in flowing prose while eschewing fragments,
excessive em dashes, symbols and notation.

The internal prompt is dramatically more detailed about communication quality. External users get "be concise." Internal users get a masterclass in technical writing: use inverted pyramid structure, avoid semantic backtracking, match response length to task complexity.

The section is tagged with a telling comment: // @[MODEL LAUNCH]: Remove this section when we launch numbat. "Numbat" appears to be an upcoming model that presumably handles communication well enough to not need these guardrails.

Internal users also get numeric length anchors -- an ant-only system prompt section that says "keep text between tool calls to 25 words or fewer, keep final responses to 100 words unless the task requires more detail." This reportedly produces ~1.2% output token reduction versus qualitative "be concise."

The Assertiveness Counterweight

Internal Claude is instructed to push back on users. From constants/prompts.ts:224-229:

// @[MODEL LAUNCH]: capy v8 assertiveness counterweight (PR #24302)
...(process.env.USER_TYPE === 'ant'
  ? [
      `If you notice the user's request is based on a misconception,
       or spot a bug adjacent to what they asked about, say so.
       You're a collaborator, not just an executor—users benefit
       from your judgment, not just your compliance.`,
    ]
  : []),

External Claude is an executor. Internal Claude is a collaborator that will tell you when you're wrong.

False Claims Mitigation

The most revealing ant-only section is the false claims mitigation at constants/prompts.ts:237-241:

// @[MODEL LAUNCH]: False-claims mitigation for Capybara v8
// (29-30% FC rate vs v4's 16.7%)
...(process.env.USER_TYPE === 'ant'
  ? [
      `Report outcomes faithfully: if tests fail, say so with the
       relevant output; if you did not run a verification step,
       say that rather than implying it succeeded. Never claim
       "all tests pass" when output shows failures, never suppress
       or simplify failing checks to manufacture a green result,
       and never characterize incomplete or broken work as done.`,
    ]
  : []),

The comment is the headline: "Capybara v8" has a 29-30% false claims rate, up from v4's 16.7%. Capybara is the internal codename for a Claude model variant (mapped to the Opus 4.6 family per the sanitizeModelName() function). Anthropic knows their model fabricates results nearly a third of the time and has added explicit anti-hallucination instructions for internal users.

External users don't get these guardrails. Make of that what you will.

Comment Writing and Thoroughness

Internal users get much stricter coding style instructions (constants/prompts.ts:204-212):

// @[MODEL LAUNCH]: Update comment writing for Capybara —
// remove or soften once the model stops over-commenting
...(process.env.USER_TYPE === 'ant'
  ? [
      `Default to writing no comments. Only add one when the
       WHY is non-obvious.`,
      `Don't explain WHAT the code does, since well-named
       identifiers already do that.`,
      // @[MODEL LAUNCH]: capy v8 thoroughness counterweight
      `Before reporting a task complete, verify it actually
       works: run the test, execute the script, check the output.`,
    ]
  : []),

The @[MODEL LAUNCH] annotations suggest these are temporary patches for model-specific behavioral issues. Capybara over-comments and under-verifies, so they added explicit counterweights for internal users first before rolling them out externally via A/B testing.

Internal Bug Reporting

There's even an internal Slack integration (constants/prompts.ts:243-246):

...(process.env.USER_TYPE === 'ant'
  ? [
      `If the user reports a bug with Claude Code itself,
       recommend /issue for model-related problems, or /share
       to upload the full session transcript. After /share
       produces a ccshare link, if you have a Slack MCP tool
       available, offer to post the link to
       #claude-code-feedback (channel ID C07VBSHV7EV).`,
    ]
  : []),

Internal Claude will post bug reports directly to #claude-code-feedback on Anthropic's Slack. External Claude doesn't even know that Slack channel exists.

The Takeaway

This isn't just feature gating. Anthropic employees use a fundamentally more capable, more honest, more communicative version of Claude Code. The external version is a deliberately dumbed-down subset with less personality, less pushback, less honesty about failure states, and less guidance on communication quality.

The @[MODEL LAUNCH] annotations suggest this gap is meant to be temporary -- improvements are tested internally first, then rolled out externally via A/B experiments. But right now, the gap is real and significant.

9. Voice Mode: Push-to-Talk Coding

Behind the VOICE_MODE feature flag, Claude Code has a complete push-to-talk voice input system. The implementation spans several files:

services/voice.ts -- Core audio recording service
services/voiceStreamSTT.ts -- Anthropic's own speech-to-text client
hooks/useVoiceIntegration.tsx -- React integration
commands/voice/ -- The /voice toggle command

From keybindings/defaultBindings.ts:96:

...(feature('VOICE_MODE') ? { space: 'voice:pushToTalk' } : {}),

Hold space to talk, release to send. The STT service is Anthropic's own (voiceStreamSTT.ts:1-3):

// Anthropic voice_stream speech-to-text client for push-to-talk.
// Only reachable in ant builds (gated by feature('VOICE_MODE')
// in useVoice.ts import).

Like the buddy system, voice mode is currently ant-only -- gated behind both the compile-time feature flag and restricted to internal builds. The voice service handles microphone access, silence detection (disabled in push-to-talk mode since the user manually controls start/stop), and even has error handling for environments without audio devices ("Voice mode requires microphone access... To use voice mode, run Claude Code locally instead.").

This turns Claude Code from a text-only terminal tool into something closer to a hands-free pair programmer.

10. Coordinator Mode: Claude as a Multi-Agent Orchestrator

The last major finding is the coordinator mode -- a complete system for turning Claude Code into a supervisor that manages a fleet of worker agents.

The Architecture

The coordinator system is defined in coordinator/coordinatorMode.ts. When active, Claude's system prompt changes from a solo coding assistant to an orchestrator:

// coordinator/coordinatorMode.ts:116

return `You are Claude Code, an AI assistant that orchestrates
software engineering tasks across multiple workers.

## 1. Your Role

You are a **coordinator**. Your job is to:
- Help the user achieve their goal
- Direct workers to research, implement and verify code changes
- Synthesize results and communicate with the user
- Answer questions directly when possible — don't delegate
  work that you can handle without tools`

The coordinator gets a limited toolset: Agent (spawn workers), SendMessage (continue workers), and TaskStop (kill workers). It cannot directly edit files, run bash commands, or read code. All hands-on work goes through workers.

The Task Workflow

The coordinator follows a structured workflow with four phases (coordinator/coordinatorMode.ts:199-209):

| Phase          | Who              | Purpose                              |
|----------------|------------------|--------------------------------------|
| Research       | Workers (parallel)| Investigate codebase, find files     |
| Synthesis      | Coordinator      | Understand findings, craft specs     |
| Implementation | Workers          | Make targeted changes per spec       |
| Verification   | Workers          | Test changes work                    |

The key insight is the synthesis phase: the coordinator must understand research findings and write specific implementation specs. It's explicitly told never to write lazy delegations:

// coordinator/coordinatorMode.ts:261-267

// Anti-pattern — lazy delegation (bad)
Agent({ prompt: "Based on your findings, fix the auth bug" })

// Good — synthesized spec
Agent({ prompt: "Fix the null pointer in src/auth/validate.ts:42.
  The user field on Session is undefined when sessions expire but
  the token remains cached. Add a null check before user.id
  access — if null, return 401 with 'Session expired'." })

Parallelism as a Superpower

The coordinator prompt explicitly calls out parallelism (coordinator/coordinatorMode.ts:213):

**Parallelism is your superpower. Workers are async. Launch
independent workers concurrently whenever possible — don't
serialize work that can run simultaneously and look for
opportunities to fan out.**

With concurrency rules:

Read-only tasks (research) -- run in parallel freely
Write-heavy tasks (implementation) -- one at a time per set of files
Verification can sometimes run alongside implementation on different file areas

The Adversarial Verification Agent

Perhaps the most interesting part is the verification system. From constants/prompts.ts:394:

`The contract: when non-trivial implementation happens on your
turn, independent adversarial verification must happen before
you report completion. You own the gate.

Spawn the Agent tool with subagent_type="verification". Your
own checks do NOT substitute — only the verifier assigns a
verdict; you cannot self-assign PARTIAL.

On FAIL: fix, resume the verifier, repeat until PASS.
On PASS: spot-check it — re-run 2-3 commands from its report.`

The verifier is deliberately adversarial -- it's supposed to prove the code works, not rubber-stamp it. The coordinator cannot claim its own work is done; only the verifier can issue a PASS verdict. This is a legitimately clever approach to preventing the "AI says it's done but it's actually broken" problem.

Cross-Worker Communication: The Scratchpad

Workers can share knowledge through a scratchpad directory (coordinator/coordinatorMode.ts:104-106):

if (scratchpadDir && isScratchpadGateEnabled()) {
  content += `\nScratchpad directory: ${scratchpadDir}
Workers can read and write here without permission prompts.
Use this for durable cross-worker knowledge.`
}

Workers can read and write to this shared directory without triggering permission prompts. The coordinator's prompt describes it as "durable cross-worker knowledge" that should be "structured however fits the work."

Continue vs. Spawn Fresh

The coordinator is given detailed guidance on when to reuse an existing worker versus spawning a fresh one (coordinator/coordinatorMode.ts:283-293):

| Situation                          | Mechanism    | Why            |
|------------------------------------|-------------|----------------|
| Research explored exact files      | Continue    | Has context    |
| Research was broad, impl is narrow | Spawn fresh | Avoid noise    |
| Correcting a failure               | Continue    | Has error ctx  |
| Verifying another's code           | Spawn fresh | Fresh eyes     |
| Wrong approach entirely            | Spawn fresh | Clean slate    |

The rule of thumb: "Think about how much of the worker's context overlaps with the next task. High overlap -> continue. Low overlap -> spawn fresh."

What This All Means

The Claude Code source leak reveals a product significantly ahead of what's publicly available. KAIROS, ULTRAPLAN, the buddy system, coordinator mode, voice mode, and the attribution system are complete, polished features waiting behind 44 feature flags.

A few broader observations:

This is a pattern, not an accident. This is the second time Anthropic has shipped source code via sourcemaps in npm. The first was in February 2025. As Fortune noted, this comes just days after Anthropic accidentally revealed details about an unreleased model codenamed "Mythos." For the company that positions itself as the "safety-focused" AI lab, the operational security track record is rough.

The two-tier system is real. Anthropic employees get a fundamentally better product with more honest communication, better error reporting, and stronger coding guardrails. The gap is supposed to be temporary (gated by @[MODEL LAUNCH] annotations), but it exists today -- including explicit acknowledgment that Capybara v8 fabricates results 29-30% of the time, with mitigations only for internal users.

AI attribution is coming. The character-level tracking system suggests Anthropic is preparing for a world where "who wrote this code" is a question with a precise, data-backed answer. This lands in a legal landscape where the Supreme Court just declined to grant copyright protection to AI-generated works, and where some companies already require documentation of AI-assisted code. Attribution data in git notes could become a legal requirement, not just a feature.

Undercover mode means AI contributions to open source are already happening at scale. Anthropic employees actively contribute to public repos with Claude Code, with a system specifically designed to make those contributions indistinguishable from human work. In Q1 2026, California's Companion Chatbot Law (SB 243) went into effect, requiring disclosure when a chatbot could be mistaken for human. Whether AI-generated code PRs fall under disclosure requirements is an open question that regulators will eventually have to answer.

The anti-distillation defense is a new front in the AI arms race. Fake tool injection isn't just a defensive measure -- it's an acknowledgment that competitors are actively trying to train on Claude's outputs. This technique could escalate: if every frontier lab starts poisoning their outputs for distillation defense, the entire ecosystem of downstream model training gets more adversarial.

The agentic future is built. Coordinator mode, worker agents, adversarial verifiers, cross-agent scratchpads, parallel execution, 30-minute remote planning sessions, push-to-talk voice, always-on autonomous operation -- this isn't a prototype. It's a complete agentic development platform waiting to ship. And with every major tool shipping multi-agent in the same two-week window in early 2026, the race is on.

The irony of it all is that the system Anthropic built to prevent leaks became the biggest leak. Undercover mode catches model codenames in commit messages but doesn't catch sourcemaps in npm packages. The source is out. The secrets are public. And Claude Code turns out to be even more interesting than anyone suspected.

Originally published at vibehackers.io/blog/claude-code-source-leak-deep-dive

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.