DEV Community

Marcus Rowe
Marcus Rowe

Posted on • Originally published at techsifted.com

Anthropic Built an AI So Good at Hacking That It Won't Release It to the Public (April 2026)

FTC Disclosure: TechSifted uses affiliate links. We may earn a commission if you click and buy — at no extra cost to you. Our editorial opinions are our own.


There's a version of Claude you can't use.

Anthropic built it, evaluated it, and decided the general public shouldn't have access to it — not because it doesn't work, but because it works too well at a specific and dangerous thing.

Claude Mythos is a real model. It exists. It's running right now inside a restricted program called Project Glasswing that limits access to roughly 40 organizations. And this week, German banking authorities publicly confirmed they're assessing the risks it poses to the financial sector.

Here's what's actually happening.


How This Started

In late March 2026, a configuration error at Anthropic accidentally exposed internal assets to the public — including details about a model called Mythos that nobody outside the company had heard of. The leak revealed it was positioned above Claude Opus in capability, and described it internally as "a step change."

Anthropic confirmed it on March 26 without sharing details, and on April 7 made the formal announcement: Mythos Preview would be released in limited form through Project Glasswing — a restricted cybersecurity consortium capped at roughly 40 organizations, including Amazon, Apple, Google, Microsoft, NVIDIA, and JPMorgan. Everyone else: no access. No waitlist. No planned general release.

The reason they gave is specific. In internal testing, Mythos demonstrated the ability to:

  • Autonomously identify and exploit zero-day vulnerabilities in real production codebases
  • Reverse-engineer exploits from closed-source software it had never been trained on
  • Chain multiple vulnerabilities together into multi-stage attacks

In testing, it found thousands of zero-day vulnerabilities. Over 99% of those vulnerabilities were still unpatched at the time of the announcement.

Let that land for a second. A model that, if released publicly, would effectively be a mass-scale zero-day vulnerability finder accessible to anyone with an API key.

Anthropic's argument for restricted release rather than no release: the same capability that makes it dangerous for attackers makes it powerful for defenders. Project Glasswing partners get up to $100 million in usage credits — but only for defensive security work. Offensive use is explicitly prohibited under the terms.


The European Banking Response

Today, German financial authorities confirmed they're assessing the risks of Mythos to the banking sector. This is notable — not because German banks are particularly likely to be Claude customers, but because it's the first major regulatory body to publicly flag a specific AI model as a systemic risk concern.

The concern isn't that banks will use Mythos to attack each other. The concern is that threat actors will find ways to access the same capabilities — through the 40 partner organizations, through adversarial prompting, through future leaks — and direct those capabilities at financial infrastructure.

This is the part of the AI safety conversation that's been mostly theoretical until now. "What if an AI could autonomously find and exploit critical infrastructure vulnerabilities?" is no longer hypothetical. The answer is: it can. And right now, the only thing standing between that capability and general availability is Anthropic's access control policy.

The UK AI Safety Institute has also published an evaluation of Mythos, confirming the cybersecurity capabilities described in Anthropic's own announcement. Independent verification matters here — it rules out the possibility that Anthropic is overstating capabilities for dramatic effect.


What This Means for Regular Claude Users

Nothing changes for Claude Sonnet 4.6, Claude Opus 4.6, or any of the publicly available models in the Claude family. Mythos is a separate system with separate access controls. If you use Claude for writing, coding, analysis, or any of the standard productivity use cases — that doesn't change.

What this does affect is the trajectory of the Claude product line. Anthropic has now demonstrated it will withhold a major model release entirely on safety grounds — even when that model represents a genuine competitive capability leap. That's a different posture than what we've seen from most of the industry.

For context: OpenAI has sometimes delayed releases, but they've consistently moved toward broader access over time. Meta's open-source releases via Llama have explicitly prioritized democratized access over safety restrictions. Anthropic's decision here is a meaningful departure from both approaches.

Whether you think that's responsible or over-cautious depends a lot on how you weigh the risks of misuse against the benefits of open access. Reasonable people disagree on this. But the decision is made, and it's worth understanding what it signals.


The Harder Question

I'll be honest — I have mixed feelings about this story.

On one hand: if a model can find thousands of zero-day vulnerabilities autonomously, and 99% of those vulnerabilities are still unpatched, then restricting access seems like the obvious call. The upside of giving that to everyone is hard to articulate. The downside is obvious.

On the other hand: "we'll decide who gets access to powerful capabilities" is a lot of power to hold. The 40 organizations in Project Glasswing are mostly large corporations and hyperscalers. Small security firms, independent researchers, academic institutions — they're not in the room. The assumption is that Amazon and Google will use Mythos responsibly for defensive work. That assumption deserves scrutiny.

What I'm fairly confident about: the regulatory interest that started in Germany won't stop there. This story isn't finished.


Quick Take

Claude Mythos is real, it's restricted, and the restrictions exist for reasons that are unusually concrete. The cybersecurity capabilities are confirmed by the UK's AI Safety Institute, not just Anthropic's own marketing. European banking regulators are now in the conversation.

This is the first major instance of an AI lab withholding a frontier model specifically because of autonomous offensive capability concerns — not because the model doesn't work, but because it works too well.

Worth paying attention to. The policy questions here are going to get louder before they get quieter.

For more on the Claude model family and what's publicly available, see our Claude AI guides.

Top comments (0)