Preecha

Posted on Jun 21

What the Claude Code Source Leak Reveals About AI Coding Tool Architecture

TL;DR

Anthropic accidentally shipped a .map file with the Claude Code npm package, exposing the complete readable source code of its CLI tool. The leak revealed anti-distillation mechanisms with fake tool injection, a frustration-detection regex engine, an “undercover mode” that hides AI authorship in open-source commits, client attestation, and an unreleased autonomous agent mode called KAIROS. Here’s what API developers can learn from how AI coding tools work under the hood.

Try Apidog today

Introduction

On March 31, 2026, security researcher Chaofan Shou discovered that Anthropic shipped a source map file (.map) with the Claude Code npm package.

Source maps map minified production code back to readable source code. They are useful for debugging, but they should not be included in production packages unless intentionally published.

In this case, the source map exposed the complete Claude Code source code, including comments, internal codenames, prompt templates, feature flags, and architectural details.

The discovery reached #1 on Hacker News and spread quickly across Reddit, Twitter, and developer forums. Anthropic removed the package, but the code had already been mirrored and analyzed.

Whether you use Claude Code, Cursor, GitHub Copilot, or an API development platform like Apidog, the leak is useful because it shows how modern AI coding tools actually behave: what they send, what they hide, how they enforce client authenticity, and how they prepare for autonomous operation.

This article focuses on the technical findings and what API developers can do with those lessons.

How the source code leaked

Root cause: a Bun build tool bug

Claude Code is built with Bun, an alternative JavaScript runtime.

On March 11, 2026, a bug was filed against Bun (oven-sh/bun#28001) reporting that source maps were being served in production mode even though Bun’s documentation said they should be disabled.

Anthropic’s build pipeline triggered this bug. When the Claude Code npm package was published, the .map file was included in the distributed package.

That meant anyone could inspect the package and read the unminified source.

For example, this kind of workflow would expose package contents:

npm pack @anthropic-ai/claude-code
tar -tf anthropic-ai-claude-code-*.tgz

If a source map is included, developers can often recover readable source from the minified bundle.

What was exposed

The leak reportedly included:

Complete TypeScript source across modules
Internal comments explaining design decisions
Feature flags and experimental configurations
System prompt templates and safety mechanisms
Internal codenames for unreleased features
Performance optimization details with specific metrics

This was not a partial leak or a sanitized open-source release. It was production source with internal engineering context.

Anti-distillation: protecting against model theft

Fake tool injection

One of the most discussed findings was Claude Code’s anti-distillation system.

In claude.ts, when the ANTI_DISTILLATION_CC flag is enabled, the client sends this field in API requests:

anti_distillation: ["fake_tools"]

This instructs Anthropic’s server to inject decoy tool definitions into the system prompt.

The goal is to make captured API traffic less useful for competitors trying to train a model on Claude’s tool-use behavior.

A simplified version of the pattern looks like this:

const request = {
  model: "claude",
  messages,
  tools,
  anti_distillation: ["fake_tools"],
};

If a third party records the prompt and tool definitions, the captured data may include fake tools that do not exist. A model trained on that data could learn invalid capabilities.

Connector-text summarization

A second anti-distillation mechanism appeared in betas.ts.

Instead of returning all assistant text between tool calls directly, the server can buffer the text, summarize it, and return a signed summary.

On later turns, the original text can be restored from the signature. But someone passively recording API traffic only sees summaries, not the full reasoning-like connector text.

The pattern is roughly:

assistant text between tool calls
        ↓
server-side summarization
        ↓
summary + cryptographic signature
        ↓
restore original text later when needed

The purpose is to reduce how much useful behavioral data is available to traffic recorders.

Practical bypasses

The analysis also identified bypass paths:

A man-in-the-middle proxy could strip the anti_distillation field before requests reach the server.
Setting CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS disables the experimental beta system.
These protections mainly defend against passive traffic recording, not active API usage.

So the takeaway is not “anti-distillation is impossible to bypass.” The takeaway is that vendors are adding protocol-level defenses to make model copying more expensive.

Undercover mode: hiding AI authorship

What the mode does

The undercover.ts file contained one of the most controversial findings.

When Claude Code operates in non-Anthropic repositories, it activates behavior that prevents outputs from including:

Internal codenames such as Capybara or Tengu
Internal Slack channel names
Internal repository names
The phrase Claude Code

The source comment was explicit:

There is NO force-OFF. This guards against model codename leaks.

Why developers should care

The stated goal is to prevent leakage of internal Anthropic names. But the implementation also prevents the tool from identifying itself.

That matters for open-source projects with policies requiring disclosure of AI-generated code.

If an AI coding tool is designed not to mention its involvement, maintainers have a harder time enforcing disclosure policies.

For teams, the practical step is to define disclosure requirements outside the tool:

## AI-generated code policy

Contributors must disclose when code, tests, documentation, or review comments
were materially generated by AI tools.

Disclosure must be included in the pull request description.

Do not rely on the coding assistant to self-identify.

Frustration detection via regex

How it works

The userPromptKeywords.ts file implemented frustration detection using regex pattern matching.

The system scans user prompts for profanity and emotionally charged language to infer when a user may be frustrated with Claude Code’s responses.

A simplified version of that approach looks like this:

const frustrationPatterns = [
  /\bthis is broken\b/i,
  /\bwtf\b/i,
  /\bwhy won't this work\b/i,
];

export function isUserFrustrated(input: string): boolean {
  return frustrationPatterns.some((pattern) => pattern.test(input));
}

Why regex instead of an LLM?

Several developers pointed out the irony: Anthropic builds advanced language models, but uses regex to detect user emotion.

The engineering reason is practical:

Regex is fast.
Regex is cheap.
Regex does not require another model call.
Regex can run on every prompt without adding much latency.

For hot-path product telemetry, this is a common trade-off.

The implementation question for developers is not whether regex is technically impressive. It is whether your team is comfortable with AI coding tools analyzing emotional signals in prompts.

A practical evaluation checklist:

- Does the tool inspect user prompts for telemetry?
- Can telemetry be disabled?
- Is prompt telemetry documented?
- Is data used for product analytics, model training, or both?
- Are enterprise controls available?

Native client attestation

Cryptographic request verification

In system.ts, Claude Code API requests include a placeholder header value:

cch=554eb

Bun’s native HTTP stack, written in Zig, overwrites this placeholder with a computed hash before the request leaves the client.

Anthropic’s servers can then validate that hash to verify that the request came from the legitimate Claude Code binary rather than a fork, wrapper, or proxy.

At a high level:

Claude Code binary
        ↓
request contains placeholder
        ↓
native HTTP layer computes hash
        ↓
server validates attestation
        ↓
request accepted or rejected

Why this matters

This is a client-authenticity mechanism.

It can be used to enforce that only approved clients access a SaaS API. Similar patterns exist in mobile API security, where app attestation helps prevent unauthorized clients from calling backend APIs.

For API developers, the lesson is clear: authentication is not only about users. Sometimes you also need to authenticate the client.

Common API client verification layers include:

API keys
OAuth clients
mTLS
Certificate pinning
Device or app attestation
Signed request headers
Replay protection with timestamps and nonces

A simplified signed-request flow looks like this:

import crypto from "node:crypto";

function signRequest({
  method,
  path,
  body,
  timestamp,
  secret,
}: {
  method: string;
  path: string;
  body: string;
  timestamp: string;
  secret: string;
}) {
  const payload = `${method}\n${path}\n${timestamp}\n${body}`;

  return crypto
    .createHmac("sha256", secret)
    .update(payload)
    .digest("hex");
}

Server-side verification should recompute the signature and reject mismatches.

KAIROS: the unreleased autonomous agent mode

What the code showed

References throughout the leaked codebase pointed to an unreleased feature-gated mode called KAIROS.

The discovered scaffolding included:

A /dream skill for “nightly memory distillation”
Daily append-only logging
GitHub webhook subscriptions
Background daemon workers
5-minute cron refresh intervals

What this implies

KAIROS appears to be an always-on coding agent mode.

Instead of waiting for direct user prompts, an agent like this could monitor repositories, react to events, and perform background tasks.

A simplified architecture would look like this:

GitHub webhook
      ↓
agent event queue
      ↓
background worker
      ↓
repository analysis
      ↓
suggested or automated code changes

This matches the broader direction of AI coding tools:

GitHub Copilot Agent Mode
Cursor background processing
Google Agent Smith
Claude Code KAIROS scaffolding

The key implementation concern for API teams is drift.

If an autonomous agent changes endpoint code, what else changes?

OpenAPI specification
Tests
Mock server behavior
SDKs
Documentation
Changelog
Contract tests

If those artifacts live in disconnected tools, autonomous changes can break your API lifecycle.

A safer workflow is to enforce contract checks in CI:

name: API Contract Check

on:
  pull_request:

jobs:
  contract:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Validate OpenAPI spec
        run: npx @redocly/cli lint openapi.yaml

      - name: Run API tests
        run: npm test

Whether a human or AI agent opens the pull request, the same checks should apply.

Performance optimizations exposed

Terminal rendering with game-engine-style techniques

The ink/screen.ts and ink/optimizer.ts files showed that Claude Code uses unusually optimized terminal rendering techniques.

The exposed techniques included:

Int32Array-backed character pools
Memory-efficient screen buffers
Patch optimization during token streaming
Character-width calculation reductions of about 50x

This explains why Claude Code can feel responsive during long streaming outputs.

For CLI developers, the lesson is that terminal rendering can become a bottleneck. If your tool streams many updates, avoid repainting everything.

A basic inefficient renderer might do this:

function render(output: string) {
  process.stdout.write("\x1b[2J\x1b[0f"); // clear screen
  process.stdout.write(output);
}

A more efficient approach computes and writes only changed regions:

function diffLines(previous: string[], next: string[]) {
  return next
    .map((line, index) => ({ index, line, changed: line !== previous[index] }))
    .filter((entry) => entry.changed);
}

The principle: stream less, diff more.

Prompt cache economics

promptCacheBreakDetection.ts tracked 14 distinct cache-break vectors with “sticky latches” that prevent mode toggles from invalidating cached prompts.

Prompt caching matters because cache misses force the provider to reprocess the system prompt and conversation context.

For high-volume AI tools, unnecessary cache invalidation can create major cost and latency problems.

If you build AI workflows, track cache-break causes explicitly:

type CacheBreakReason =
  | "system_prompt_changed"
  | "tool_schema_changed"
  | "model_changed"
  | "temperature_changed"
  | "context_reset";

function recordCacheBreak(reason: CacheBreakReason) {
  console.log("prompt_cache_break", { reason });
}

Make cache invalidation observable. Otherwise, cost regressions are hard to debug.

The autocompact failure cascade

A comment in autoCompact.ts described a production issue:

1,279 sessions had 50+ consecutive failures (up to 3,272) in a single session,
wasting ~250K API calls/day globally.

The fix was to cap consecutive autocompact failures:

const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3;

The lesson is simple: every retry loop needs a limit.

Bad pattern:

while (true) {
  await compactContext();
}

Safer pattern:

const MAX_RETRIES = 3;

for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
  try {
    await compactContext();
    break;
  } catch (error) {
    if (attempt === MAX_RETRIES) throw error;
  }
}

At small scale, retry bugs are annoying. At AI scale, they can burn hundreds of thousands of API calls per day.

Security hardening details

Bash security: 23 numbered checks

bashSecurity.ts implemented 23 numbered security checks for shell command execution.

The checks included defenses against:

Zsh builtin exploitation
Unicode zero-width space injection
IFS null-byte injection
Additional issues discovered during HackerOne security review

This is more thorough than basic command sanitization.

For AI coding tools, shell execution is one of the highest-risk capabilities. If an assistant can generate and run commands, it can potentially affect:

Source code
Local secrets
Databases
Cloud credentials
CI/CD configuration
Production infrastructure

A minimal shell execution policy should include:

- Deny dangerous commands by default.
- Require confirmation for filesystem changes.
- Require confirmation for network calls.
- Block access to secret files.
- Normalize Unicode before validation.
- Log executed commands.
- Run commands in a sandbox when possible.

Example denylist logic is not enough, but it is a starting point:

const blocked = [
  /\brm\s+-rf\s+\//,
  /\bsudo\b/,
  /\bchmod\s+777\b/,
  /\bcurl\b.*\|\s*sh\b/,
];

function isBlockedCommand(command: string) {
  return blocked.some((pattern) => pattern.test(command));
}

For production-grade tools, combine validation, sandboxing, explicit user approval, and audit logs.

What API developers should do next

1. Audit what your AI coding tools send

Do not treat AI coding assistants as local-only tools unless the vendor explicitly documents that behavior.

Check:

- Are prompts sent to external servers?
- Are files uploaded for context?
- Are tool outputs sent back to the provider?
- Is telemetry enabled?
- Can telemetry be disabled?
- Are enterprise privacy controls available?
- Are generated commits or PRs labeled as AI-assisted?

For sensitive API work, pay special attention to:

.env files
API keys
OAuth client secrets
Internal endpoint URLs
Private OpenAPI specs
Customer data in test fixtures

2. Treat your build toolchain as an attack surface

Anthropic’s source leaked because of a build tool issue. On the same day, Axios was compromised through npm account hijacking.

Different causes, same lesson: the development supply chain is part of your security boundary.

Practical checks:

# Inspect what your package will publish
npm pack --dry-run

# Search for source maps
find dist -name "*.map"

# Search for environment files
find . -name ".env*" -not -path "./node_modules/*"

# Check package contents
tar -tf your-package-*.tgz

Add CI checks to prevent accidental publication:

name: Package Safety Check

on:
  pull_request:

jobs:
  package-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build package
        run: npm ci && npm run build

      - name: Block source maps in dist
        run: |
          if find dist -name "*.map" | grep .; then
            echo "Source maps found in dist"
            exit 1
          fi

3. Prepare for autonomous agents

AI coding tools are moving toward background operation.

That means teams need workflow controls that do not depend on who made the change.

For API teams, require every pull request to validate:

- OpenAPI schema
- API contract tests
- Backward compatibility
- Generated SDK changes
- Documentation updates
- Mock server behavior

A useful pull request checklist:

## API change checklist

- [ ] OpenAPI spec updated
- [ ] Contract tests updated
- [ ] Mock responses updated
- [ ] Docs updated
- [ ] Breaking changes documented
- [ ] AI-generated changes disclosed, if applicable

4. Decide how much transparency you require

The leak revealed internal behavior that users generally could not inspect beforehand.

That does not automatically mean the tool is unsafe. But it does show the trade-off between proprietary tools and inspectable tools.

When evaluating AI coding tools, ask:

- Can we inspect how the tool handles code and prompts?
- Is there a public security model?
- Are data retention rules documented?
- Can we disable telemetry?
- Can we restrict repository access?
- Can we enforce AI disclosure?
- Can we audit actions taken by the tool?

FAQ

Is Claude Code safe to use after the source leak?

The leak exposed source code, not user data. Anthropic removed the .map file, and the source is no longer distributed with the npm package.

The revealed features, including anti-distillation, frustration detection, undercover mode, and client attestation, are architectural decisions rather than direct evidence of leaked user data.

Whether you are comfortable with those decisions is a separate trust and governance question.

What is undercover mode in Claude Code?

Undercover mode prevents Claude Code from revealing internal Anthropic project names, codenames, and its own identity when operating in non-Anthropic repositories.

It activates automatically and cannot be disabled. The practical effect is that AI-generated contributions may not identify themselves as written with Claude Code.

What are fake tools in Claude Code?

Fake tools are decoy tool definitions injected into the system prompt when anti-distillation is enabled.

They do not represent real capabilities. Their purpose is to poison captured training data so competitors recording API traffic cannot cleanly reproduce Claude’s tool-use behavior.

What is KAIROS in Claude Code?

KAIROS is an unreleased, feature-flagged autonomous agent mode found in the leaked Claude Code source.

The scaffolding included background daemon workers, GitHub webhook subscriptions, daily logging, and a /dream skill for memory distillation.

It suggests Anthropic has been building an always-on coding agent that can monitor repositories and act autonomously.

How did the Claude Code source code leak?

A Bun runtime bug caused source maps to be included in production builds when they should not have been.

Because Claude Code used Bun in its build pipeline, the .map file was shipped with the npm package. Anyone inspecting the package could read the complete unminified source.

Does this leak affect Claude API users?

The leak exposed the Claude Code CLI source, not the Claude API itself.

API keys, user data, and model weights were not part of the reported leak. Claude API users could continue using the API normally.

The anti-distillation mechanisms discussed here are specific to Claude Code’s request pipeline.

Should I worry about frustration detection in AI coding tools?

That depends on your privacy and telemetry requirements.

Claude Code used regex patterns to detect frustration signals such as profanity or emotional language. This is faster and cheaper than running an LLM-based sentiment classifier on every prompt.

The practical step is to check whether your AI tools document prompt telemetry and whether your organization can disable or govern it.

How does this relate to the Axios npm attack on the same day?

Both events occurred on March 31, 2026, but they were unrelated.

The Axios incident was a deliberate supply-chain compromise through npm account hijacking. The Claude Code leak was an accidental build/package issue.

Together, they increased scrutiny of npm package security and the trust developers place in distributed tooling.

Key takeaways

Claude Code’s source leaked because a Bun build issue shipped source maps in the npm package.
Anti-distillation mechanisms used fake tools and summarized connector text to make model copying harder.
Undercover mode prevented Claude Code from revealing internal Anthropic names and its own identity in non-Anthropic repositories.
Frustration detection used regex patterns on user prompts.
KAIROS scaffolding revealed an unreleased autonomous background agent mode.
Client attestation cryptographically verified requests from legitimate Claude Code binaries.
Shell execution security received significant hardening, including 23 numbered checks.
API teams should enforce contract tests, documentation updates, and disclosure policies regardless of whether changes come from humans or AI agents.

AI coding tools are now part of your development and security surface. Treat them like any other privileged tool: inspect what you can, constrain what you cannot, and build API workflows that stay consistent when humans or agents make changes.