Anthropic's Two Security Incidents Confirmed a Held-Back Frontier Model Called Mythos

#ai #machinelearning #news #technology

Anthropic had two security incidents in five days. The combination revealed something unprecedented: a frontier AI model the company built and then deliberately decided not to release, on safety grounds.

Two Leaks, Five Days

The first incident broke on March 26. Fortune reported that close to 3,000 files belonging to Anthropic had been sitting in an unsecured, publicly searchable data store. Among them was a draft blog post describing an unreleased model called Mythos (internally also referred to as Capybara). The draft described it as "by far the most powerful AI model we've ever developed," more capable than Opus 4.6 across coding, academic reasoning, and cybersecurity benchmarks. Anthropic confirmed the model exists and said the company is "being deliberate about how we release it."

The second incident broke on March 31. Anthropic's official Claude Code npm package (@anthropic-ai/claude-code v2.1.88) shipped with an exposed source map file: roughly 57MB, mapping 512,000 lines of code across 1,900 files. The full Claude Code codebase was publicly readable for a window before Anthropic's takedown. Code analysis surfaced an unshipped feature roadmap with capabilities not yet announced, and corroborated the Capybara/Mythos tier from the prior leak.

Mythos: A Frontier Model Anthropic Is Holding Back

Multiple independent reviewers describe Mythos as a tier above Opus 4.6, with significant jumps on coding, reasoning, and cybersecurity benchmarks. Internal notes describe it as offering "a step change in cyber capabilities." Zvi Mowshowitz's full writeup documents the evidence and the implications, citing several of those reviewers.

That framing matters. This isn't a model that isn't ready yet, or a product that hasn't been productized. It's a capability Anthropic built and then decided not to deploy because of its potential for misuse in cybersecurity contexts.

Anthropic also disclosed that a Chinese state-sponsored group ran a coordinated campaign using Claude Code to infiltrate roughly 30 organizations before being detected. That's the dual-use evidence pattern that justifies holding the capability back: the same model that helps cybersecurity defenders also helps cybersecurity attackers, and the attacker side is now demonstrably real. This appears to be one of the first publicly documented cases of a frontier model deliberately withheld on safety grounds rather than readiness or commercial timing. OpenAI and Google DeepMind have both discussed withholding capabilities in the abstract; this is a concrete documented case.

The DMCA Overreach

Anthropic's response to the leak created a secondary incident. Their DMCA takedown effort, aimed at removing the leaked code from GitHub, accidentally removed legitimate public forks of an unrelated open-source repository before the error was caught and reversed. Ars Technica documented the full timeline.

The overreach was reversed quickly, but the documentation of a large AI lab deploying automated DMCA tooling that can't distinguish between a leak and a legitimate fork is worth noting for anyone running open-source projects.

The AMD Performance Complaint

The same week the leak broke, AMD's AI Director Stella Laurenzo filed a public GitHub ticket reporting measurable performance regression in Claude Code, stating the tool "cannot be trusted to perform complex engineering tasks" based on analysis of 6,852 sessions. Her data showed degradation beginning around March 8, specifically in reasoning depth and targeted editing behavior.

She attributed the regression to the deployment of "thinking content redaction" in version 2.1.69, which strips thinking content from API responses. Her hypothesis: when thinking is shallow, the model defaults to cheaper actions (rewrite entire files, stop without completing). The Register covered the full ticket.

A named enterprise director, with six thousand sessions of data, publishing publicly. That's a different category of complaint than anonymous forum posts.

The Source-Map Security Pattern

The leak itself surfaced a security practice worth checking: source maps were included in a published npm package. Source maps are invaluable for debugging, but when included in production packages, they expose the full source code of your compiled JavaScript to anyone who knows where to look.

If your team publishes compiled JavaScript to npm and hasn't audited which files are included in the published package, this is worth checking. The .npmignore file or the files field in package.json controls what ships. Source maps should be excluded from published packages or hosted separately with restricted access.

This story is from Edge Briefing: AI, a weekly newsletter curating the signal from AI noise. Subscribe for free to get it every Tuesday.(