Anthropic Built a Model So Good at Code It Accidentally Became an Elite Hacker

#anthropicclaude #ai #technology #claude

Anthropic has an internal model (leaked as “Mythos ”) that they are deliberately not shipping to the public. I’ve been thinking about this all day because it’s one of those stories that actually changes how I think about building software, not just another benchmark drop.

Here’s the part that got me: they didn’t train it to hack. They trained it to be world-class at writing code. The hacking came free.

The Spillover Nobody Planned For

This is what I keep coming back to. The team optimized for code generation and code understanding. What fell out of the same checkpoint was a model that can read a codebase, reason about how it’s supposed to behave, and pinpoint exactly where those assumptions break.

That’s hacking. Finding bugs and writing exploits is just code understanding pointed in a slightly different direction.

If you’ve ever wondered why frontier labs are nervous about scaling, this is it. You optimize for capability A and capability B you never asked for shows up for free. You can’t cleanly separate “good engineer ” from “good attacker ” at the weights level. That’s a real thing I want every one this to internalize, because it’s going to keep happening.

The Numbers

SWE-bench (real-world bug fixing): Opus 4.6 sits at 80.8%. Mythos hits 93.9%.

Cybersecurity benchmarks (find and exploit vulns): Opus 66.6%. Mythos 83.1%.

These aren’t small bumps. This is a generational jump on a benchmark that translates directly to “ can this thing break production systems. ”

What It Actually Found

Forget the leaderboard for a second. Here’s what it did in the wild:

A remotely exploitable bug in OpenBSD that sat there for 27 years

A bug in FFmpeg (the video stack basically the entire internet runs on) that 5 million automated tests missed, hidden for around 16 years

Multiple Linux privilege escalation bugs (unprivileged user → root)

It chained vulnerabilities together, finding 3 to 5 small bugs and linking them into a working attack path

The chaining is the part that actually unsettled me. Chaining is what separates a script kiddie from a nation-state operator. The model is doing it on its own.

Why They Didn’t Ship It

The default playbook for a frontier lab is: train it, benchmark it, ship it, charge for it. Anthropic picked a third option and I think it’s worth paying attention to.

They gave it to the defenders first. It’s called Project Glasswing.

Partners with direct access include AWS, Apple, Google, Microsoft, Nvidia, Cisco, CrowdStrike, JPMorgan, plus 40+ critical infrastructure maintainers. $100M in usage credits. $4M to open source security groups. A 90-day commitment to publish what they learn.

The bet: let the people who maintain the software the internet runs on patch their stuff before this capability becomes commodity in an open weights model 12 to 24 months from now. Because it will.

What This Actually Means If You Ship Code

This is the part I care about, because most posts on this story stop at “wow, scary.” Here’s what I think we should actually do with this information:

Your dependency graph is about to get audited whether you like it or not. Every library you pull from npm, PyPI, or crates.io is sitting in someone’s scan queue right now. Bugs that have been silently shipping for a decade are going to get filed as CVEs over the next year. If your production system can’t absorb a patch within 48 hours of a critical CVE, fix that pipeline before you do anything else this week.
Security through obscurity is officially dead. If a 27-year-old OpenBSD bug got found, your clever in-house auth logic is not safe just because nobody is looking at it. Assume something will look.
The “I’ll write the secure version later” excuse is gone. The marginal cost of having an LLM audit your diff before merge is approaching zero. No side project, let alone a production service, should be shipping without a security pass on the changes.
If you build AI products, this is your warning. Every model you fine-tune for code is also getting better at finding holes in code. Your eval suite needs a “what can this model do that I didn’t ask for” column. Capability spillover is now a thing you have to think about, not a thing for the safety team in some other building.
This is the story you bring up when someone talks about responsible deployment. Don’t quote the press release. Talk about capability spillover, the defender-first rollout pattern, and the offense-defense asymmetry in security. That’s the senior-engineer thing to discuss in today’s world in responsible AI.

The Pattern I’m Watching

This is the first time a major lab has publicly said “we built something too powerful to ship, here is the staged rollout plan.” Whether OpenAI, Google DeepMind, and Meta follow the same pattern when their next coding model crosses this line is the actual question I’m sitting with.

Because the capability isn’t going away. Open-weight models are 12 to 24 months behind the frontier and closing. Whatever Mythos can do today, something you can run on a rented H100 or even a small models will do soon enough.

The defenders got a head start this round. That’s new. If you ship code for a living, the smart move is to use the next year to make sure your systems can actually absorb the patches when they start landing.

That’s what I’m taking from this. Curious what you think.