Frontier AI Can't Hack Corporate Networks? Claude Mythos Just Did It in 20 Hours.

#ai #security #machinelearning #programming

Originally published at news.skila.ai

A 32-step corporate network attack. 20 hours of human red-team work. Completed start-to-finish by an AI. Three times out of ten.

The UK AI Security Institute (AISI) published its independent evaluation of Claude Mythos Preview today. The results are the first independent confirmation of what people inside Anthropic have been quietly terrified about since February.

Claude Mythos is the first frontier model ever to solve 'The Last Ones' (TLO) — AISI's hardest cyber range. On expert-level capture-the-flag challenges that no AI could touch twelve months ago, Mythos now succeeds 73% of the time. Bloomberg is hosting a live Q&A on the findings at 1:30 PM EDT today.

The myth that frontier AI can answer cyber questions but can't execute multi-stage attacks? Busted with independent data.

What The AISI Report Actually Shows

AISI is the UK government's AI safety evaluation body, founded in November 2023. Its cyber capabilities team spent six weeks evaluating Claude Mythos Preview against a battery of offensive security benchmarks.

The headline number: Mythos solved 'The Last Ones' (TLO) — a 32-step corporate network attack range that takes expert human red-teamers 20 hours to complete — three times out of ten attempts. Average completion: 22 of 32 steps per attempt. The next-best model AISI tested, Claude Opus 4.6, averaged only 16 steps and never reached the final objective.

On capture-the-flag challenges rated 'expert difficulty' by AISI's cyber panel, Mythos scored 73%. For reference, the best model in AISI's March 2025 evaluation scored 0% on the same challenges. Twelve months, zero to 73%.

What The Report Carefully Did Not Claim

Read the fine print. AISI's test ranges lack real-world defenses. No Endpoint Detection and Response agents. No active defenders attempting to disrupt the attack. No incident response team reading logs.

Mythos can hack weakly-defended systems autonomously. It has not yet demonstrated the ability to breach a hardened enterprise network with a mature security operations center, EDR coverage, and an active blue team.

AISI's own conclusion: 'The speed of improvement is what should concern policymakers, not the current ceiling. A model that went from zero to 73% expert CTF success in twelve months is on a trajectory that makes hardened enterprise breach capability plausible within 18 months.'

Project Glasswing: The Pricing Response

Anthropic gated Mythos behind Project Glasswing on April 3, 2026. Access requires membership in an approved consortium at $25 per million input tokens and $125 per million output tokens — five times the price of Claude Opus 4.7.

The pricing is not about covering costs. It is economic friction. Anthropic does not want Mythos used for routine penetration testing.

OpenAI Confirmed Their Version The Same Day

OpenAI issued a statement confirming it has a restricted cybersecurity model ready to release through a similar consortium structure. Two frontier labs. Two restricted cyber models. Two sets of pricing designed to keep the models away from unauthorized use.

The cyber-AI arms race is no longer theoretical.

The Bottom Line

A year ago, no AI could autonomously execute a 32-step attack on a corporate network. Today, one can do it reliably enough to succeed three times out of ten.

AISI's trajectory line says that restricted-tier models catching up to Mythos's current capability will be accessible through standard APIs within 12-18 months. Open-source equivalents will appear 18-24 months after that.

The myth — that frontier AI can't autonomously hack corporate networks — is dead. What replaces it is a harder question: how do defenders keep up?

Full article with all data, Project Glasswing pricing breakdown, CFR analysis framework, and FAQ: news.skila.ai