Anthropic Just Did Something Unprecedented: They Hid Their Best Security Model

#ai #security #anthropic #llm

Today Anthropic announced Claude Mythos — a model so good at finding security vulnerabilities that they decided not to release it.

Instead, they launched Project Glasswing: a restricted program that gives access only to vetted security researchers and major tech companies. The model has already found vulnerabilities in every major operating system and web browser, including a 27-year-old bug in OpenBSD.

This is not a marketing stunt.

What Makes Mythos Different

Claude Opus 4.6 had a near-0% success rate at autonomous exploit development. Mythos Preview developed working exploits 181 times out of hundreds of attempts on the same test cases.

Nicholas Carlini from Anthropic said he found more bugs in the last couple weeks than in his entire career combined. Thats not hyperbole — the OpenBSD vulnerability they discovered has been in the codebase for 27 years.

The model can:

Chain 4-5 vulnerabilities together into sophisticated exploit chains
Write JIT heap sprays that escape browser AND OS sandboxes
Find privilege escalation paths that humans missed for decades

Why This Matters for Developers

This is a genuine inflection point. Security professionals are already drowning in AI-generated vulnerability reports:

Months ago, we were getting AI slop. Something happened a month ago, and the world switched. Now we have real reports. — Greg Kroah-Hartman, Linux kernel

Im spending hours per day on this now. Its intense. — Daniel Stenberg, curl

The codebases running the internets infrastructure — Linux, OpenBSD, browsers, servers — are being systematically audited by machines that dont get tired. The 27-year-old OpenBSD bug wasnt obscure — it was in TCP packet handling. Anyone could have found it. No one did. Until now.

The Trade-off

Anthropic is putting $100M in compute credits and $4M in direct donations behind Project Glasswing. Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation.

But the model wont be generally available. If youre not a vetted partner, you wont get access.

Is that the right call?

I think so. The security community has been warning about this for months. The gap between AI can find bugs and AI can chain exploits autonomously just closed. Anthropic is giving infrastructure maintainers time to harden their systems before the capability proliferates.

Because it will proliferate. Other labs will reach this threshold. The question isnt whether bad actors eventually get this — its whether critical systems get patched first.

What Comes Next

Two things are happening simultaneously:

Frontier models are becoming genuinely dangerous — not in a sci-fi way, but in the boring sense that they can autonomously find and exploit vulnerabilities in production systems.
Responsible disclosure is scaling — instead of one researcher finding one bug, were seeing systematic auditing of entire codebases that have been running the internet for decades.

The maintainers of OpenBSD, Linux, curl, Firefox — theyre all getting AI-generated reports now. Some are slop. Many are real.

Project Glasswing is Anthropic acknowledging that releasing a model this capable, without guardrails, would be reckless. Its also them saying they expect safeguards to catch up — eventually.

The model is called Mythos. The name fits. Its powerful, elusive, and only a few will get to use it.

But the bugs its finding? Those are very real.