Anthropic Just Admitted Their New Model Is Too Dangerous to Release

#ai #anthropic #security #llm

Claude Mythos finds vulnerabilities that have existed for 27 years. Anthropic is only giving it to security researchers. This is the first real AI safety inflection point.

Anthropic didn't release a model today.

They announced Project Glasswing instead.

The model, Claude Mythos, exists. It works. But you can't use it.

Only a handful of security researchers get access.

Why? Because it found a 27-year-old bug in OpenBSD by sending "a couple pieces of data to any server and crash it." Because it chains four or five vulnerabilities together to build exploits that escape sandboxes. Because it achieved a 181/200 success rate on exploit development where the previous best model managed 2.

This is the first time an AI company has said: we built something we can't give you.

The Capability Gap Is Real

Let me put some numbers on this.

Claude Opus 4.6: near-0% success rate at autonomous exploit development.

Claude Mythos: 181 working exploits out of 200 attempts on Firefox JavaScript engine vulnerabilities.

That's not an incremental improvement. That's a phase transition.

Nicholas Carlini from Anthropic puts it bluntly: "I've found more bugs in the last couple of weeks than I found in the rest of my life combined."

What Mythos Actually Does

The technical details are genuinely sobering:

Browser exploits: Chained four vulnerabilities including JIT heap spray to escape both renderer and OS sandboxes
Kernel exploits: Found race conditions and KASLR bypasses for local privilege escalation on Linux
Network exploits: Built a 20-gadget ROP chain split across multiple packets for FreeBSD NFS root access
Ancient bugs: Found a TCP SACK bug in OpenBSD that existed for 27 years

These aren't theoretical. They're patched. OpenBSD has the errata. Linux has the commits. This is real vulnerability research at a scale and speed humans can't match.

Why This Matters

Two things are happening simultaneously:

First, the capability is here. Not next year, not in five years. Right now. A model exists that can autonomously find and exploit vulnerabilities in the software that runs the internet.

Second, the responsible deployment problem is real. Anthropic isn't hyping this as their next product. They're treating it as a controlled substance.

The Linux kernel maintainers noticed something changed about a month ago. Greg Kroah-Hartman: "Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real."

Daniel Stenberg of curl: "I'm spending hours per day on this now. It's intense."

The Proliferation Question

Anthropic's position is: we're giving the industry time to prepare before these capabilities become widely available.

The uncomfortable truth is that they're right to worry. If Mythos-level capability exists at Anthropic, it will exist elsewhere. Open models are improving. Chinese labs like Z.ai just released GLM-5.1, a 754B parameter model MIT-licensed on Hugging Face. The gap between frontier proprietary models and open models is measured in months now, not years.

What to Watch

Three signals matter now:

Patch velocity: How quickly major projects can respond to AI-discovered vulnerabilities
Proliferation: When non-Anthropic models achieve similar exploit development success rates
Defensive tooling: Whether AI can be turned around to systematically harden software faster than it can be broken

The defensive question is the important one. Mythos finds bugs. Can another model write code that doesn't have them?

The Takeaway

We've crossed a threshold.

AI companies have talked about safety for years. Anthropic just demonstrated it - by withholding a working product that they believe could cause real harm if deployed carelessly.

Whether you trust their judgment or not, this is the moment where AI capability and AI safety stopped being theoretical discussions and started being operational decisions.

The model exists. You can't use it. And that's the story.

This isn't marketing. It's containment. And it's the first real test of whether the AI industry can police itself before someone forces them to.