Anthropic Just Did Something Unprecedented: They Kept a Model Because It Was Too Good at Hacking

#ai #anthropic #security #llm

Yesterday Anthropic made an announcement that would have sounded like marketing hype a year ago. They released a new model, Claude Mythos, but not to you or me. Only to a select group of security researchers.

The reason? It is genuinely too dangerous for general release.

This is not a stunt. The security community has been sounding alarms for weeks, and Anthropic is the first to act on them.

What Makes Mythos Different

Claude Mythos is not a specialized security tool. It is a general-purpose model, comparable to Claude Opus 4.6 in most tasks. But its ability to find and exploit vulnerabilities is something we have not seen before.

From Anthropic's own technical writeup:

Mythos Preview wrote a web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray that escaped both renderer and OS sandboxes.

That is not a typo. Four vulnerabilities. Chained autonomously.

They also tested it against Firefox 147's JavaScript engine. Opus 4.6 managed working exploits 2 times out of hundreds of attempts. Mythos? 181 working exploits, plus register control on 29 more.

The 27-Year-Old Bug

Nicholas Carlini from Anthropic's security team appeared in their announcement video with a striking claim:

I've found more bugs in the last couple of weeks than I found in the rest of my life combined.

One example: an OpenBSD vulnerability that had existed for 27 years. Send a few packets to any OpenBSD server and crash it. Fixed and patched now, but the code had been there since the late 1990s.

Linux vulnerabilities too. User-to-root privilege escalation. These are not theoretical.

The Industry Already Knew Something Changed

This announcement did not come out of nowhere. Open source maintainers have been reporting a shift for months.

Greg Kroah-Hartman, Linux kernel maintainer:

Months ago, we were getting what we called AI slop, AI-generated security reports that were obviously wrong or low quality. Something happened a month ago, and the world switched. Now we have real reports.

Daniel Stenberg, curl maintainer:

The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a... plain security report tsunami. Many of them really good.

Thomas Ptacek, security researcher, titled his recent post simply: Vulnerability Research Is Cooked.

What Anthropic Is Doing About It

Project Glasswing is not just about restricting access. It includes:

$100M in usage credits for security partners (AWS, Apple, Microsoft, Google, Linux Foundation)
$4M in direct donations to open-source security organizations
Focused work on foundational systems (operating systems, browsers, critical infrastructure)

They are explicitly not releasing Mythos to general users:

We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale.

Why This Matters

This is the first time a major AI lab has held back a model primarily because of its offensive security capabilities.

We have heard concerns about AI being "too dangerous" before. Usually it feels like marketing. This time the people sounding the alarm are kernel maintainers and security researchers who have spent decades in this space. They are not easily impressed.

The vulnerability research community has operated on the assumption that finding bugs requires significant expertise and effort. That assumption is breaking. What took weeks of careful analysis can now be automated.

The Tradeoff

There is a reasonable argument that restricting access slows down legitimate security research. The same capabilities that could find vulnerabilities in critical infrastructure could also help defend it.

Anthropic's approach is to give trusted partners a head start. Let the defenders patch first. Then release more capable models with appropriate safeguards.

Is that the right balance? I am not sure anyone knows yet. But it is a significant departure from the usual "release everything, deal with consequences later" approach.