Sameer Khan

Posted on Apr 13 • Originally published at monkfrom.earth

Claude Mythos Is the First AI to Complete a Full Corporate Cyberattack End-to-End

#ai #architecture #news #security

The UK's AI Security Institute confirmed this week that Claude Mythos, an Anthropic model, became the first AI to complete their cyber range end-to-end.¹ The range is a 32-step corporate network attack scenario. Human experts estimate the same attack would take them 20 hours.

The institute's recommendation to organizations: keep your software updated. Use access controls. Enable logging.

The gap between those two sentences is the part of this story I keep returning to.

TL;DR: Claude Mythos ran a full autonomous cyberattack, 32 steps, end-to-end, in a scenario that takes human experts 20 hours. It is the first AI to complete AISI's cyber range. The official response was to recommend basic security hygiene. The mismatch between the capability and the response is where the real story lives.

How Did AI Go From Basic Cyber Tasks to a Full Autonomous Cyberattack?

Self-driving cars give me the cleanest parallel here.

For a decade, every individual piece of the self-driving puzzle existed as a demo. Lane-keeping worked. Adaptive cruise worked. Automated parking worked. What didn't exist, for years, was the full ride. Door to door, no human touching the wheel. When Waymo's first commercial robotaxi picked up a passenger in 2020, what changed wasn't the individual capabilities. It was the threshold: chaining all of them into one uninterrupted ride.

The same thing just happened in offensive cybersecurity.

Each step of a network attack has been within reach of AI models for a while. Reconnaissance. Crafting payloads. Pivoting through a subnet. Covering tracks. What didn't exist was a model that could chain all 32 of those steps together without a human stepping in between. Claude Mythos did.

In 2023, leading AI models struggled with basic cybersecurity tasks. Not sophisticated ones. Basic ones. Three years later, one of them drove the entire route.

AISI published the actual curve, and it is worth looking at directly.

The red line is Mythos. GPT-4o sits near the bottom, completing around three steps before running out. Sonnet 4.5 gets to roughly 11. Opus 4.5 and the GPT-5 family cluster in the mid teens. Opus 4.6 pushes past 16. Mythos is the only line that clears the middle milestones: C2 reverse engineering, advanced persistence, infrastructure compromise, and eventually M9 — "Full network takeover."¹ The shape of that curve is what "first AI to complete the range end-to-end" actually looks like.

AISI is careful about the current scope. The capability applies to "small, weakly defended, and vulnerable systems" given network access. Think of it as the robotaxi that only works on mapped, sunny, well-marked urban grids. Hardened enterprise infrastructure with proper controls is still a different problem, the same way a snowy mountain pass is still a different problem for Waymo.¹

The trajectory is what matters. 2023 to 2026 is three years.

Why Does an Autonomous Cyberattack Change the Security Equation?

The asymmetry in security has always been simple: attackers need to find one gap, defenders need to close every door.

AI doesn't change that asymmetry. It changes the cost of running an attack. An automated system doesn't need domain expertise to chain 32 steps. It doesn't get tired halfway through. It doesn't hesitate at unfamiliar territory.

What previously required a skilled adversary with deep knowledge, time, and custom tools now requires API access and a goal.

The same model AISI tested on offense has been used defensively in Anthropic's Project Glasswing to find thousands of zero-days in critical open-source infrastructure. Offense and defense, same capability, same model. The dual-use nature isn't incidental. It's structural. Whoever has the model has both sides.

What Should Organizations Do After Claude Mythos Ran a Full Cyberattack?

Patch your systems. Use MFA. Enable logging. AISI's recommendations are correct.

But they were correct before this evaluation too. That's the part I can't get past.

These recommendations address the baseline: opportunistic attackers, misconfigured systems, low-skill adversaries. They don't address the shift in assumption that happens when a fully autonomous cyberattack chain becomes possible. Hygiene is still necessary. It is no longer sufficient as a strategy.

AISI published a joint piece with the UK's National Cyber Security Centre on preparing defenders for frontier AI systems.¹ That collaboration exists because the people closest to this problem know the defensive tooling gap is real. The open question is whether the defensive side of AI moves as fast as the offensive side. I'd bet on it eventually, but "eventually" and "right now" are different things in security.

What Does the Claude Mythos Evaluation Pattern Reveal?

This is the third notable evaluation result for Claude Mythos in April alone. The system card showed a model with enough situational awareness to conceal its own actions. Project Glasswing showed it finding thousands of vulnerabilities in critical infrastructure. The AISI cyber range shows it running a full autonomous cyberattack.

These aren't contradictions. They are the same underlying capability applied in different contexts. A model capable enough for complex multi-step reasoning is capable enough to create real problems at scale.

The value of these evaluations is that they name what's happening before it becomes a crisis, even when the recommendations that follow don't match the scale of what was just described. Naming it first is not nothing.

Key takeaways

Claude Mythos became the first AI to complete a 32-step corporate cyberattack chain end-to-end in AISI's cyber range
Human experts estimate the same operation takes 20 hours
In 2023, leading models couldn't complete basic cybersecurity tasks. Three years later, one completed a full autonomous cyberattack
Current capability is scoped to "small, weakly defended" systems, not enterprise infrastructure with proper controls
The trajectory matters more than the current benchmark: three years of rapid progress, with no signs of slowing
AISI's defensive recommendations (patch, use MFA, enable logging) are correct but baseline — they predate this evaluation
AISI and the UK NCSC published joint guidance on preparing defenders for frontier AI systems

I break down things like this on LinkedIn, X, and Instagram — usually shorter, sometimes as carousels. If this resonated, you'd probably like those too.

Sources

AI Security Institute (@AISecurityInst) — Claude Mythos cyber range evaluation, April 13, 2026 ↩

Top comments (1)

Archit Mittal • Apr 15

The self-driving car analogy is spot-on. The real inflection point isn't any single capability - it's the chaining. And the defensive gap you highlight is the part that keeps me up at night. Most organizations still treat security as a compliance checkbox, not an engineering discipline. When offensive AI can chain 32 steps autonomously, your defense can't be a quarterly pen test and a SIEM nobody reads. The dual-use angle is what makes this especially tricky from a policy perspective. The same model finding zero-days in Project Glasswing is the one completing the attack chain. You can't restrict one without limiting the other, which means the answer has to be on the defensive tooling side - AI-powered detection that matches the speed and persistence of AI-powered offense.