When I reviewed Claude Opus 4.8 two weeks ago, I flagged one sentence in the announcement as the most interesting thing in it. Anthropic said Mythos-class models were coming to all customers in the coming weeks, gated on safety work rather than capability. That was the tell. The gap between what these labs can build and what they choose to ship was loosening.
Yesterday, June 9, the other shoe dropped. Claude Fable 5 shipped, and it is a Mythos-class model. Same release playbook as always. No waitlist, no staged rollout. It landed in the Claude API, on Bedrock, in GitHub Copilot, and on the consumer plans on the same day, with the model ID claude-fable-5 ready to drop into config.
So I spent the day doing what I do with every release. I threw my hardest real tasks at it, dug through the announcement and the third-party benchmarks, and tried to separate what genuinely changed from the launch-day shine. This one is different from the last few. Not because the benchmarks moved a few points, but because Claude Fable 5 is the first time Anthropic has handed the public a model from the tier they previously decided was too capable to release.
Here is what I found.
What Fable 5 and Mythos 5 Actually Are
The naming is doing a lot of work here, so it is worth slowing down. There are two models in this release, and they are the same model.
Claude Fable 5 is the Mythos-class model with safety classifiers turned on. This is the one you and I get. It is available right now through the API, the cloud providers, and the subscription plans.
Claude Mythos 5 is the identical underlying model with certain safeguards removed. It is not generally available. Right now it is restricted to cybersecurity professionals and infrastructure providers through something Anthropic is calling Project Glasswing, with a trusted-access program for biology researchers planned next.
I wrote about Claude Mythos back when it was a locked research preview, the model that scored absurdly high on coding and cyber benchmarks and that Anthropic explicitly chose not to ship. Fable 5 is the answer to the obvious question that post raised: what happens when they finally decide it is safe enough to release? The answer is that they release it with a set of classifiers bolted on, keep the unfiltered version behind a vetting process, and call the two halves by different names.
That split matters more than it looks, and I will come back to it. But first, the part everyone actually wants to know.
The Benchmarks That Matter
Anthropic claims Fable 5 is state-of-the-art on nearly every capability benchmark they tested, and for once the third-party numbers back the marketing instead of softening it. The pattern from the Opus 4.8 release, where the gains were real but incremental, does not hold here. These are step changes.
Here are the numbers worth knowing.
| Benchmark | What it measures | Fable 5 | For comparison |
|---|---|---|---|
| SWE-Bench Pro | Real-world software engineering | 80.3% | Opus 4.8: 69.2%, GPT-5.5: 58.6% |
| FrontierCode | Production-grade code quality | 29.3% | Opus 4.8: 13.4% |
| GDP.pdf | Vision reasoning over documents, no tools | 29.8% | GPT-5.5: 24.9% |
| ExploitBench (Mythos 5) | Cybersecurity, guardrails off | 78.0% | Opus 4.8: 40.0% |
| Core analytics | Complex analytical tasks | First model over 90% | Previous frontier under the line |
The SWE-Bench Pro jump is the one that stopped me. Going from 69% to 80% does not sound like much until you remember what that benchmark is. It is not toy problems. It is real engineering tasks pulled from real repositories, the kind where the model has to understand a codebase, make a change that spans multiple files, and not break anything else. An eleven-point gain at that altitude is the difference between a model that gets most things right and one that gets the hard things right too.
FrontierCode is the other eye-opener. More than doubling Opus 4.8's score on a benchmark designed to test whether code meets production standards, not just whether it runs, lines up with what I felt in actual use. The output reads less like generated code and more like code a careful engineer wrote.
The ExploitBench number belongs to Mythos 5, the unfiltered sibling, which is why it nearly doubles Opus 4.8. That gap is the entire reason the unfiltered version is locked behind Project Glasswing. A model that scores 78% on offensive security tasks is exactly the dual-use capability that makes a lab nervous, and it is worth holding that number in your head when we get to the safety section.
What 80% on SWE-Bench Pro Feels Like in Practice
Benchmarks tell you the model is capable. They do not tell you what the capability feels like when you are the one driving. So I gave it the work I actually do.
The first test was a refactor I had been avoiding. A tangled service layer in one of my projects, about a dozen files, with state management that had grown organically and badly over a year. The kind of thing where the agentic coding loop usually drifts. One agent, one file at a time, me re-explaining the convention every few files as context slips.
Fable 5 handled it in a way that felt qualitatively different. It read the whole service layer, identified the actual structural problem rather than just the surface symptoms, and proposed a refactor that I would have been happy to write myself. Not every choice was mine. But the reasoning was sound enough that the disagreements were about taste, not correctness.
The claim Anthropic leans on hardest is sustained reasoning. The line in the announcement is that the longer and more complex the task, the larger Fable 5's lead. Early testers reported that apps which needed a hundred prompts a year ago now one-shot. I cannot fully verify the hundred-prompt claim, but the direction is right. The model holds focus across a long task better than anything I have used. It does not lose the thread halfway through a migration the way every previous model eventually does.
The headline customer story is Stripe, who said Fable 5 compressed months of engineering into days, completing a 50-million-line Ruby codebase migration in a single day that would normally take a team two months. I cannot test a 50-million-line migration. But having watched it chew through my own multi-file refactor without me babysitting context, I find the shape of that claim plausible in a way I would have rolled my eyes at six months ago.
This is where the self-correction work from Opus 4.8 compounds. Fable 5 inherits the honesty improvements and pairs them with raw capability. It catches its own mistakes more reliably and the mistakes it makes are rarer to begin with.
The Price Doubled, and That Changes the Math
Here is the part that is going to reshape how you use it. Claude Fable 5 costs $10 per million input tokens and $50 per million output tokens. That is double Opus 4.8, which sits at $5 and $25.
For the last several releases, the story was capability going up while price held flat. I made a whole point of it in the Opus 4.8 review, because flat pricing is the quiet engine behind the improving economics of building AI features. Fable 5 breaks that pattern. The price went up because the model is genuinely more expensive to run, and Anthropic is not hiding it.
To be fair, they frame it as a discount. Fable 5 is less than half the price of the old Mythos Preview, so relative to the Mythos tier this is a price cut. But relative to your actual bill, the one you pay today on Opus 4.8, it is a doubling.
So the calculus is no longer "use the best model for everything." It is back to routing.
| Model | Input (per 1M) | Output (per 1M) | Use it for |
|---|---|---|---|
| Opus 4.8 | $5 | $25 | Daily coding, most agent work, anything high-volume |
| Fable 5 | $10 | $50 | The hard tasks where the extra capability pays for itself |
The honest framing is that Fable 5 is not a replacement for your default model. It is a tool for the top of the difficulty curve. The gnarly migration, the architecture decision with real tradeoffs, the debugging session that spans three systems and has resisted every cheaper attempt. For those, paying double is trivial against the time saved. For your everyday loop of small edits and lookups, you are lighting money on fire if you route all of it through Fable 5.
If you are mapping out spend across plans and API usage, my Claude pricing survival guide walks through how to think about the tradeoffs, and this release adds a new top tier to that decision. It also makes a strong case for getting serious about token cost management if you have not already, because the cost of being lazy about model selection just doubled.
On the plans side, Anthropic is doing the usual launch promotion. Fable 5 is included at no extra cost on Pro, Max, Team, and Enterprise through June 22, after which usage credits kick in pending capacity. So you have about two weeks to hammer on it for free before the meter starts.
Mythos 5: The Same Brain With the Guardrails Off
The most genuinely novel thing in this release is not Fable 5. It is the decision to ship its unfiltered twin at all, even to a restricted group.
Mythos 5 is Fable 5 with the safety classifiers removed. Same weights, same intelligence, none of the blocking. Anthropic is only giving it to cybersecurity professionals and infrastructure providers through Project Glasswing right now, with a biology-researcher program coming that will lift the bio safeguards while keeping the cyber ones in place.
The reasoning is straightforward once you look at the ExploitBench number. The unfiltered model scores 78% on offensive security work, nearly double Opus 4.8. That is a capability you want defenders to have and attackers not to. Gating it behind a vetted program is Anthropic trying to thread that needle, putting the sharp version in the hands of people who use it to harden systems while keeping it away from everyone else.
For the security testing I am authorized to do, the existence of a model this capable on the defensive side is a real shift. The flip side is the one I keep thinking about. If the only thing standing between the public model and the offensive model is a set of classifiers, then the safety of the whole arrangement rests entirely on how good those classifiers are. Which brings us to the part of this release that should bother you a little.
The Safeguards, and the Part That Should Bother You
Fable 5 ships with three classifier systems. One blocks offensive cybersecurity and exploitation tasks. One blocks dual-use biology and chemistry research. One prevents distillation, the extraction of the model's capabilities into a smaller model.
The implementation is interesting. When a safeguard triggers, Fable 5 does not refuse. It silently falls back to Opus 4.8 and answers from there. Anthropic says this happens in less than 5% of sessions on average, and that the system is tuned conservatively, so it sometimes blocks benign requests. External red-teaming reportedly found zero successful harmful single-turn requests against 30 public jailbreak techniques, which is a strong result if it holds up.
So far, so reasonable. A model that downgrades instead of refusing is a better user experience than a hard wall, and a transparent classifier that tells you when it fired is fine.
The problem, and Nathan Lambert at Interconnects laid this out sharply, is that not all of the downgrading is transparent. He distinguishes between the disclosed safeguards, cyber and bio and distillation, which notify you when they kick in, and undisclosed modifications around frontier AI research that change the model's behavior without telling you. His line is worth quoting directly: "An AI model that gets less intelligent automatically without notifying me is categorically misaligned AI."
I think he is right to be annoyed, and the reason cuts straight to how I work. If I am using a model for serious engineering and it can quietly become a different, dumber model mid-session without telling me, my eval suite cannot account for it. The model I tested is not reliably the model I am running. Lambert goes further and says he cannot trust Fable 5 for frontier ML development work for exactly this reason, and reads the opacity as more about protecting Anthropic's competitive position than about safety.
Whether or not you buy the competitive-entrenchment read, the practical takeaway for developers is concrete. If you are building on Fable 5, assume a small fraction of your requests may be answered by Opus 4.8 instead, and assume you may not always be told. Build your evals and your output validation to be robust to that, because the model behind the API is not a fixed quantity. This is the first frontier release where I would call non-determinism in which model answers a first-class concern rather than a footnote.
One more operational detail: Mythos-class traffic now carries a mandatory 30-day data retention policy. Anthropic says the data is not used for training or non-safety purposes and is deleted after 30 days in most cases, with human access logged. If you work under strict data-handling requirements, read that policy before you route production traffic through Fable 5.
The Science Results Are the Real Story
The coding numbers will get the headlines because that is what most of us buy these models for. But the results that actually made me sit up are in science, and they came from Mythos 5.
On molecular biology, the model generated novel hypotheses that scientists preferred about 80% of the time over Opus-class models. In genomics, it ran a research task largely on its own for over a week, analyzing millions of cells across 138 animal species, and reportedly outperformed a recent Science journal publication despite being a fraction of the size. In drug design, internal protein experts said it accelerated their work by roughly ten times, with nine of fourteen protein targets yielding strong candidates.
I am not a biologist and I cannot evaluate those claims on the merits. But the pattern is the one worth noticing. The thing that separates Fable 5 from the models before it is not that it writes slightly better code. It is that it can sustain genuinely autonomous work over long horizons. A week of unsupervised genomics research is a different category of capability than a clever answer to a single prompt.
That same capability is what powers the coding story. The reason the migrations work is the same reason the genomics works. The model holds the thread. If you have wrestled with agent reliability over long-running tasks, this is the first model where the long-horizon part feels solved enough to lean on rather than babysit.
Should You Switch From Opus 4.8?
Here is how I would think about it depending on where you sit.
If you do daily coding on a Pro or Max plan: Try Fable 5 on your hardest current task in the next two weeks while it is free on the plans. The capability jump is real and you should feel it on genuinely difficult work. But do not make it your default. When the credits kick in after June 22, the doubled price means Opus 4.8 should stay your workhorse and Fable 5 should be the tool you reach for when the cheaper model is struggling.
If you run Opus 4.8 in production via the API: Do not flip the model ID blindly. The price doubling alone means you need to be deliberate about which paths justify it, and the silent-fallback behavior means your outputs are now non-deterministic in a new way. Run your eval suite, then route only the high-value, hard tasks to claude-fable-5 while keeping volume traffic on Opus 4.8. This is a routing decision, not a swap.
If you are on GPT-5.5 or Gemini for primary work: The gaps in coding, vision, and agentic work just widened in Claude's favor, and by more than the Opus 4.8 release did. When I last did a full Claude vs GPT vs Gemini breakdown, the models converged on baseline and diverged on strengths. Fable 5 stretches Claude's lead in its strong areas rather than reshuffling the board. If you have been on the fence about Claude for serious engineering, this is the strongest case yet.
If you do authorized security or scientific research: Look into whether you qualify for Project Glasswing or the upcoming trusted-access programs. The unfiltered Mythos 5 is a meaningfully different tool than the public Fable 5, and for defensive security and research work that is exactly the point.
The Bigger Picture
What strikes me about Claude Fable 5 is not the SWE-Bench number, impressive as it is. It is what the release tells you about where the constraint now sits.
For the last year, the story was capability rising while price held flat, and the binding question was how much smarter the next model would be. Fable 5 flips both halves of that. The price went up, and the binding question is no longer capability. It is safety and trust. Anthropic built a model good enough that they split it in two, shipped the filtered half to everyone, locked the unfiltered half behind a vetting program, and bolted on classifiers that can quietly swap in a weaker model behind your back.
That is a different kind of release. The capability is so far ahead that the interesting decisions are now about governance, access, and disclosure rather than raw benchmarks. The most important sentence in the Opus 4.8 announcement was the one teasing Mythos-class availability. The most important fact about Fable 5 is not how smart it is. It is that "how smart is it" stopped being the hard question.
For the work I do every day, Fable 5 is the most capable tool I have ever pointed at a hard problem, and I will be reaching for it exactly when a problem is hard enough to earn the price. For everything else, Opus 4.8 is still my default. And the part I will be watching most carefully is not the next benchmark. It is whether the model answering my request is actually the model I think it is.
Top comments (0)