The company that literally wrote "Core Views on AI Safety" just accidentally published their own source code to npm.
Twice in five days, Anthropic -- the "safety-first" AI lab -- leaked confidential material through basic infrastructure mistakes. First, a misconfigured CMS exposed draft documents about their unreleased Claude Mythos model. Then, a debug source map file shipped in a public npm package gave the entire internet access to Claude Code's TypeScript source. All 512,000 lines of it. Across 1,900 files.
You can't make this up.
What actually leaked
Leak #1: Claude Mythos (March 26)
A configuration error in Anthropic's content management system made close to 3,000 unpublished assets publicly accessible. Among them: draft docs about a new model tier called Capybara, internally known as Claude Mythos. Anthropic confirmed it's real and called it "a step change" -- dramatically higher scores on coding, reasoning, and cybersecurity benchmarks than Claude Opus 4.6.
The kicker? The leaked docs say Mythos is "currently far ahead of any other AI model in cyber capabilities" but also "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders." A cybersecurity nightmare model, revealed by a cybersecurity failure. Poetic.
Leak #2: Claude Code source (March 31)
Version 2.1.88 of @anthropic-ai/claude-code shipped with a 59.8 MB source map file that pointed to an unprotected zip archive on Anthropic's Cloudflare R2 bucket. Within hours, the full TypeScript source was mirrored everywhere.
And here's the part that should make every npm publisher wince: there was a known Bun bug (issue #28001) filed 20 days earlier, reporting that source maps get served in production builds even when they shouldn't. Twenty days. Nobody caught it.
Anthropic then filed DMCA takedowns against 8,100+ GitHub repos, accidentally nuking thousands of legitimate forks of their own public Claude Code repo in the process. Their head of Claude Code had to publicly apologize for that one too.
What Claude Code's source actually tells us
Forget the drama for a second. The leaked codebase is genuinely interesting if you build AI agents. Here's what it reveals about how a production-grade AI coding agent works under the hood:
It's a harness, not magic. Claude Code is the "agentic harness" that wraps the underlying Claude model. The model itself doesn't read files or run bash commands. The harness gives it tools, manages context, and orchestrates everything. If you've been wondering how these things work -- it's tooling all the way down.
Three-layer memory. The architecture uses a MEMORY.md file as a lightweight pointer index (~150 characters per line) that's always loaded into context. It doesn't store actual knowledge -- just locations. Topic files get fetched on-demand. Raw transcripts are never fully re-read, just grepped for specific identifiers. Smart approach to the context window problem.
Sub-agent spawning. Claude Code can spin up "sub-agents" for complex tasks. Multi-agent orchestration isn't just a research paper concept -- it's how the tool actually handles things that don't fit in a single context.
Unreleased features. The source reveals something called KAIROS -- a mode where Claude Code runs as a persistent background agent that fixes errors and runs tasks without waiting for you. There's also a "dream" mode for background ideation. Neither is publicly available yet.
This is basically a blueprint for building a production AI coding tool. Cursor and every other AI code editor now have a literal reference implementation to study. One open-source project called OpenCode already rewrote the core architecture from scratch and became one of the fastest-growing repos in GitHub history.
The uncomfortable question
Anthropic has 1,500+ employees. They raised $8 billion. They have dedicated security teams. And they still shipped a debug file to npm and left a storage bucket open.
So what does that mean for the rest of us?
If a well-funded AI safety company can't catch a known Bun bug for 20 days, what's happening in the millions of apps being vibe-coded right now by solo devs and small teams? Apps where there is no security review. No staging environment. No one checking whether the npm package contains a 60 MB source map.
This isn't hypothetical. The data backs it up: 45% of AI-generated code introduces security vulnerabilities (Veracode 2026). 35 CVEs in March 2026 alone were traced to AI-generated code, up from 6 in January. And Stanford researchers found that developers using AI assistants write less secure code while being more confident it's secure.
Anthropic's leak was embarrassing. But they recovered in hours. Most teams wouldn't.
What to actually do about it
Look, I'm not saying stop using AI to write code. I use it every day. But the Anthropic incident is a good reminder that "it works" and "it's secure" are completely different statements.
A few things that would have caught this:
-
Check what you're publishing. Run
npm pack --dry-runbefore every publish. Look at the file list. If there's a 60 MB source map in there, maybe don't ship it. - Audit your storage buckets. Publicly accessible by default is a footgun. Treat every new bucket as private until explicitly changed.
- Review AI-generated code like you'd review a junior dev's PR. The code usually works. It's the edge cases, auth flows, and error handling where it falls apart.
- Use a checklist. Boring? Sure. But humans skip the boring stuff, and so does AI.
I put together a 32-point security checklist specifically for AI-generated code, mapped to OWASP Top 10. It came out of auditing a handful of vibe-coded apps and finding way more vulnerabilities than I expected. Might be useful if you're shipping AI-generated code.
The irony of Anthropic's situation writes itself. But honestly, I'm less interested in dunking on them and more interested in what the rest of us learn from it. Their code leaked because of a packaging mistake and a known bug that nobody prioritized. That's not exotic. That's Tuesday.
The question isn't whether your AI-generated code has vulnerabilities. It does. The question is whether you're catching them before your users do.
How safely are you using AI to code? Take the free Vibe Code Risk Assessment — 10 questions, 2 minutes, no signup required.
Top comments (0)