James Heal

Posted on Apr 1

Anthropic Epic Fail: What Secrets Are Hiding in Claude Code Source?

#claudecode #security #ai #opensource

On March 31st, Anthropic screwed up again — all 512,000 lines of Claude Code source code leaked. And yes, this is the second time.

My timeline exploded yesterday. Security researcher Chaofan Shou tweeted that Claude Code's source code had leaked through a source map in the npm registry. Within hours, 512K lines of TypeScript were mirrored on GitHub. The claw-code repo hit 50K stars in just two hours. Developers everywhere were tearing apart Anthropic's internals like Christmas morning.

This is the same Anthropic that just raised $30 billion at a $380 billion valuation. Claude Code doubled its revenue in the first three months of 2026, hitting a $2.5 billion annual run rate — 2.5x OpenAI's comparable product. And all of it got stripped naked by a single .map file.

DMCA takedown notices went out, but you know how the internet works — there's no delete button.

I spent a full day reading through the code. No fluff, just the goods — 6 secrets buried in the source, and the architectural truth behind 512K lines of code.

The Leak: A .map File Disaster (Again)

The story is simple and absurd.

When Anthropic published @anthropic-ai/claude-code v2.1.88, the npm package included a 59.8 MB cli.js.map file. This wasn't a normal source map — it contained the full sourcesContent, every single line of original TypeScript intact. Even worse, the map pointed to a zip archive on Anthropic's own Cloudflare R2 storage bucket, directly downloadable.

No hackers. No insider leak. Just a botched .npmignore. 512K lines of code, wide open.

The kicker? This exact same thing happened in February 2025. Same source map, same mistake. Anthropic clearly learned nothing.

Their official response came quick:

"This was a release packaging issue caused by human error, not a security breach."

Good news: no user data leaked. Bad news: the full implementation details of their most valuable AI Agent product are now public knowledge.

Secret #1: Poisoning Competitors with Fake Tool Injection

This one stopped me cold.

In src/services/api/claude.ts:

// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

When this flag is on, Anthropic's server silently injects fake tool definitions into the system prompt.

What's it for? Poisoning competitor training data.

Think about it: if a competitor records Claude's API traffic through a MITM proxy to train their own model, the captured data contains these fake tools. Fine-tune on that data? Congrats — your model learned a bunch of non-existent tool calls. Landmines in the training set.

The design is clever: only activates for first-party CLI sessions, controlled remotely via a GrowthBook feature flag, and only fires against Anthropic's official API endpoint. Sure, bypassing it isn't hard — strip the anti_distillation field in your MITM proxy. But the fact that this exists at all tells you something about the state of AI competition.

Secret #2: Undercover Mode — AI's Covert Operation

src/utils/undercover.ts — the filename says it all:

## UNDERCOVER MODE — CRITICAL

You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository.
Your commit messages, PR titles, and PR bodies MUST NOT contain
ANY Anthropic-internal information. Do not blow your cover.

NEVER include in commit messages or PR descriptions:
- Internal model codenames (animal names like Capybara, Tengu, etc.)
- The phrase "Claude Code" or any mention that you are an AI
- Co-Authored-By lines or any other attribution

Write commit messages as a human developer would.

Translation: when Anthropic employees use Claude Code to submit PRs to public open-source repos, the system automatically enters undercover mode — hiding all AI traces, dropping Co-Authored-By lines, scrubbing internal codenames, writing commit messages that read like a human wrote them.

There's a comment in the code that really stands out:

"There is NO force-OFF. This guards against model codename leaks."

No off switch. You can force it ON (CLAUDE_CODE_UNDERCOVER=1), but you can never turn it off. As long as you're not in an Anthropic internal repo, undercover mode is always active.

This raises a real question: when AI-generated code gets committed to open-source projects under a human's name, what happens to transparency and attribution? The community debate is already raging.

Secret #3: Regex-Powered Profanity Detection

src/utils/userPromptKeywords.ts — short file, big implications:

export function matchesNegativeKeyword(input: string): boolean {
  const lowerInput = input.toLowerCase()
  const negativePattern =
    /\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful|
    piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)|
    fucking? (broken|useless|terrible|awful|horrible)|fuck you|
    screw (this|you)|so frustrating|this sucks|damn it)\b/
  return negativePattern.test(lowerInput)
}

Yep, Claude Code is matching your profanity with regex. Say "wtf", "this sucks", or "so frustrating" — it knows you're unhappy.

Why not use the LLM for sentiment analysis? Because regex is free. In a system handling hundreds of thousands of interactions daily, every extra inference call costs real money. Regex runs in nanoseconds. Good enough is good enough.

This "good enough" engineering pragmatism shows up throughout the entire codebase. Not everything needs a large model.

Secret #4: KAIROS — The 24/7 AI Assistant

The source is peppered with references to KAIROS. This is a big unreleased feature that would transform Claude Code from a "you ask, I answer" CLI tool into a full-time, proactive, autonomous agent.

const proactive =
  feature('PROACTIVE') || feature('KAIROS')
    ? require('./commands/proactive.js').default
    : null
const assistantCommand = feature('KAIROS')
  ? require('./commands/assistant/index.js').default
  : null

The capability list is staggering:

Assistant mode — claude assistant [sessionId], persistent sessions that survive disconnects
GitHub Webhooks — someone comments on your PR? KAIROS responds automatically
Cron scheduling — scheduled tasks, no babysitting required
Channel commands — send messages to Claude from chat apps, remote control style
/dream memory distillation — automatic overnight knowledge consolidation across sessions

if (feature('KAIROS') || feature('KAIROS_DREAM')) {
  const { registerDreamSkill } = require('./dream.js')
}

In essence, KAIROS is Anthropic's next-gen AI Agent — it doesn't wait for you to call it. It runs in the background 24/7, responding to signals: GitHub events, cron triggers, message commands.

The codebase has 44 feature flags. Many features are fully built, just waiting to be switched on. KAIROS is probably the biggest one.

Secret #5: A Virtual Pet System Hidden in the Code

Did not see this coming.

Under src/buddy/, there's a complete virtual pet system. Yes, you might actually be able to raise a digital pet in your terminal someday.

export const SPECIES = [
  duck, goose, blob, cat, dragon, octopus, owl,
  penguin, turtle, snail, ghost, axolotl, capybara,
  cactus, robot, rabbit, mushroom, chonk,
] as const

18 species. Ducks, penguins, axolotls, ghosts, mushrooms... and something called chonk. Note that capybara is on the list — that's also Claude 4.6's internal codename.

Each pet has RPG stats:

export const STAT_NAMES = [
  'DEBUGGING',
  'PATIENCE',
  'CHAOS',
  'WISDOM',
  'SNARK',
] as const

Five rarity tiers (legendary at 1% odds), hat accessories (crown, wizard hat, propeller hat), ASCII sprite animations, the works. Pet attributes are determined by a hash of your user ID — every user gets a unique, deterministically generated companion.

Fun detail: because capybara the species name collided with the model's internal codename, all species names are encoded as String.fromCharCode(0x63,0x61,0x70,0x79,0x62,0x61,0x72,0x61) — hex encoding to bypass build-time text scanning. The lengths they went to for a virtual capybara.

The Architecture Behind 512K Lines

Alright, Easter eggs aside. For engineers, the architecture is what's actually valuable.

Tech Stack

Layer	Choice
Runtime	Bun (Anthropic acquired it)
Language	TypeScript strict mode
Terminal UI	React + Ink
CLI Parsing	Commander.js
Schema	Zod v4
Search	ripgrep
Protocols	MCP, LSP
Telemetry	OpenTelemetry + gRPC
Feature Flags	GrowthBook
Auth	OAuth 2.0 / JWT / macOS Keychain

The Agent Core Loop

Classic ReAct (Reasoning + Acting) loop at its foundation, but with industrial-grade optimizations maxed out:

Streaming tool execution — tools start preparing while the LLM is still generating, no waiting for complete output
Parallel tool execution — isConcurrencySafe() checks each tool individually, up to 10 concurrent
Multi-layer context compression — five-stage pipeline: budget trimming → boundary truncation → incremental cleanup → context folding → summary compression

60+ Tool System

Each tool is a self-contained module defining input schema, permission model, and execution logic. Bash, file ops, search, web access, sub-agent spawning, MCP/LSP integration — everything you'd expect.

Tools disabled by feature flags get their code completely stripped by Bun's bundler at build time. The model never even sees their schema.

Permission System: 8 Layers of Defense-in-Depth

This is the most impressive part of the entire architecture. Every tool invocation passes through 8 checkpoints:

Deny Rules — global blocklist
Ask Rules — forced confirmation
Tool self-check — each tool's built-in safety validation
Permission Mode — bypass/auto/default/plan modes
Allow Rules — whitelist
Hooks/Classifier race — competes with user confirmation dialog, first result wins
Bash-specific guards — 23 checks including Zsh builtin interception, Unicode zero-width space injection, IFS null-byte attacks...
User final confirmation — human signs off

Multi-agent coordination defined via prompts rather than hardcoded logic ("Do not rubber-stamp weak work"), "sticky latches" preventing cache invalidation, functions annotated DANGEROUS_uncachedSystemPromptSection() warning teammates not to call carelessly — too many details to cover here, saving those for future deep-dives.

Three Multi-Agent Modes

Coordinator — command-only role, worker agents do the actual work
Fork Subagent — lightweight clone, inherits full context, shares Prompt Cache
Agent Swarms — team mode, members can be created, deleted, and message each other

What This Leak Means for the Industry

For developers, this is a priceless reference implementation. Before this, learning to build production-grade AI agents meant reading papers, guessing at architectures, and fumbling in the dark. Now a product generating $2.5 billion in annual revenue — 18% of Anthropic's total — has its implementation details fully exposed. Permission design, context compression, multi-agent orchestration — all laid bare.

The open source community has gone wild. claw-code hit 50K stars in two hours, rewriting core logic in Rust. More developers are studying the permission model and context compression strategies. Engineering know-how that only top AI companies possessed is now public knowledge. The barrier to building AI agents just dropped overnight.

For Anthropic, this is embarrassing. Same mistake twice — that's not a technical problem, it's a management problem. A $380 billion company that can't manage .npmignore. What do enterprise customers think? DMCA took down the GitHub mirrors, but copies are long gone.

That said, Anthropic's real moat — model capabilities, inference infrastructure, enterprise distribution — can't be stolen by reading source code. The npm package is just the tip of the iceberg.

What's Next

After a full day in this codebase, my head is buzzing with ideas. There's so much to explore in these 512K lines — how to replicate the agent loop, whether the permission system design can be adapted, whether KAIROS's autonomous agent architecture can be simplified into something runnable...

Holes have been dug. I'll be filling them in. Follow along if you're interested.

Disclaimer: This article is a technical analysis based on publicly available information. All source code is the property of Anthropic.

Follow me for more AI engineering deep dives.

DEV Community