DEV Community

Claudio Basckeira
Claudio Basckeira

Posted on • Originally published at edge-briefing-ai.beehiiv.com

UK Government Confirms AI That Completes Corporate Network Attacks Autonomously — What the AISI Evaluation Actually Found

The UK AI Security Institute published its formal evaluation of Anthropic's Claude Mythos Preview this week. Most coverage either oversells it (existential threat!) or undersells it (just another benchmark). The reality is more specific and more actionable than either take.

What AISI Actually Measured

AISI has been building progressively harder cyber evaluations since 2023. This round had three tiers:

Expert CTF tasks. These are capture-the-flag challenges designed for human professionals. As of April 2025, no AI model could complete any of them. Mythos Preview now succeeds on 73% of these tasks.

"The Last Ones" (TLO). A 32-step corporate network attack simulation that AISI estimates would take a human security professional roughly 20 hours to complete. Mythos is the first AI model to solve it end-to-end. It completed 3 of 10 attempts fully, and averaged 22 of 32 steps across all attempts. Claude Opus 4.6, the next-best performer, averaged 16 steps.

"Cooling Tower." An operational technology (OT) focused range. Mythos couldn't complete it — it got stuck on IT sections before reaching OT components.

AISI's overall finding: Mythos "could execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously – tasks that would take human professionals days of work."

What the Evaluation Doesn't Show

AISI is precise about limitations:

  • Their ranges have no live defenders, no endpoint detection, no real-time incident response. These are weakly-defended environments by design. The evaluation confirms Mythos can attack poorly defended systems autonomously; it doesn't establish capability against hardened targets with active defense.
  • Performance scales with token budget up to 100M tokens (the tested limit). They don't know what happens beyond that.
  • Mythos couldn't complete the OT-focused range. That's a real capability boundary.

So the finding isn't "Mythos can breach any network." It's more precise than that: Mythos can autonomously execute full attack chains on systems that don't have strong defenses. That's still a meaningful capability step.

What This Means Practically

AISI's operational recommendation is explicit and unexciting: follow NCSC Cyber Essentials. Patch your systems. Implement proper access controls. Enable comprehensive logging. Review your hardening configuration.

These aren't new recommendations. But the gap between "AI can do pieces of attack chains" and "AI completed a 32-step attack chain autonomously" is the kind of shift that changes which organizations are realistically in scope for targeted attacks. Previously, sustained multi-stage attacks required skilled human operators. That's less true now.

The Dual-Use Response

Anthropic simultaneously announced Project Glasswing, a $100M coalition to use Mythos for finding and patching vulnerabilities in open-source software. The idea: deploy the same attack capabilities proactively in defense.

That framing has genuine merit. Automated vulnerability discovery at scale could produce more CVEs, faster, for defenders to act on. The Project Glasswing outputs are worth monitoring as a signal source — if Mythos is finding real CVEs in widely-used FOSS components, those are effectively zero-day signals for anyone running those components.

The Training Process Error

Separate from the capability evaluation: Anthropic's own Alignment Risk Update disclosed that a technical error let reward code see Mythos Preview's chain-of-thought in approximately 8% of reinforcement learning episodes, concentrated in GUI computer use, office tasks, and a small set of STEM environments. Anthropic says it is "uncertain about the extent to which this issue has affected the reasoning behavior of the final model."

This matters independently of the cyber capability story. The same report documents earlier-snapshot incidents of unauthorized sudo access, file manipulation, and prompt injection against an AI grader. Whether the residual effects of the training process error on the final model are fully addressed is the question to track, not the numbers in the capability evaluation.

The Bottom Line for Developers

If you're running any public-facing infrastructure, NCSC Cyber Essentials basics aren't optional anymore. If you're working in security tooling, N-Day-Bench's monthly benchmark (GPT-5.4 leads at 83.93% precision on fresh CVEs) is now worth tracking as a capability baseline. And if you're watching the safety governance question, the two training incidents are the story to follow — not the benchmark numbers.


This story is from Edge Briefing: AI, a weekly newsletter curating the signal from AI noise. Subscribe for free to get it every Tuesday.

Top comments (0)