Zohar Babin

Posted on Jun 10

Claude Fable 5: When to Reach for the Frontier — and When Not To

#claude #mythos #fable #ai

Lessons and best practices for routing work between Fable 5 and the cheaper Claude tiers

Anthropic released Claude Fable 5 on June 9, 2026 — the first "Mythos-class" model made generally available, a new capability tier above Opus. It is state-of-the-art on nearly every tested benchmark, and Anthropic's framing is unusually specific: "the longer and more complex the task, the larger Fable 5's lead over our other models."

That sentence is the whole routing strategy.

We audited a working portfolio of ~20 active projects — agentic platforms, SaaS products, SDKs, data pipelines, documentation suites, deployment runbooks, and high-stakes financial analysis — against what Fable 5 actually changes, and a clear pattern emerged: most day-to-day work does not benefit from Fable 5, and the work that does benefits enormously.

If you only take one thing away, take this: Sonnet 4.6 by default. Fable 5 for the long, the hard, and the expensive-to-get-wrong. Haiku 4.5 for the fan-out. Opus 4.8 when a safety classifier would ruin your day. The rest of this article is the reasoning, the sharp edges, and the cost math.

What Fable 5 actually is (and isn't)

Three facts shape every decision below.

1. It's a tier, not a feature set. Fable 5's API surface matches Opus 4.8's almost exactly: same 1M context, same 128K output, same adaptive-thinking-only design, with one new sharp edge covered below. There is no new capability checkbox. What you're buying is how far the model can go unattended. Stripe reported completing, in a day, a codebase-wide migration across 50 million lines of Ruby that would have taken a team over two months. Early testers quoted in Anthropic's announcement describe long-horizon problems "that were out of reach for earlier models."

2. The sticker says 2× Opus. Your bill will say more. $10 per million input tokens, $50 per million output — exactly double Opus 4.8's $5/$25, with no long-context surcharge. But the rate card understates the real multiplier, because a reasoning-heavy model generates more tokens per task: it thinks longer, plans more, and verifies more, and you pay for every one of those thinking tokens even though you never see them by default (more on that below). Simon Willison, a veteran developer (co-creator of Django) whose independent AI reviews are widely read in the field, consumed $110 of tokens in a single ~5.5-hour day of hands-on testing, and one Max-plan subscriber's launch-week Reddit post described a usage meter ticking up "2% a minute." The model is also slower per response. None of this is a scandal — but all of it matters for routing. (Subscription users, note the calendar: Fable 5 is included free on Pro/Max/Team plans only through June 22, 2026; after that it draws prepaid usage credits at API rates.)

3. It ships with safety classifiers — and what happens when one trips depends on where you run it. Fable 5 is the same underlying model as Claude Mythos 5 (restricted to vetted cyber-defense partners and, soon, select biology researchers), made publicly releasable by classifiers covering cybersecurity, biology/chemistry, and distillation attempts. On Anthropic's own surfaces, a tripped classifier falls back to Opus 4.8 automatically and tells you. On the raw API, the request returns stop_reason: "refusal" and your harness decides what happens next: server-side fallback is an opt-in beta, and isn't available on Bedrock, Vertex, or Foundry at all. Anthropic says trips happen in under 5% of sessions, but the classifiers are deliberately tuned conservative, so benign security and biology work can trip them. Plan for this (see "Security research" and "Handle the refusal" below).

When Fable 5 is the right tool

Across a varied real-world portfolio, four workload shapes stood out as clear Fable territory.

1. Long-horizon autonomous engineering

Multi-hour to multi-day agentic sessions: large migrations, cross-cutting refactors, greenfield builds from a finished spec, overnight runs expected to complete without mid-course human correction. This is the headline capability. If your sessions routinely run 4–8 hours with a real CI gauntlet at the end, the model that finishes correctly the first time is cheaper than the model that needs two retries — even at 2× the token price. And higher effort up front often reduces total cost on this class of work, because turn count drops (more on this in the cost section).

A special case worth calling out: greenfield implementation from a completed blueprint. If you've invested in a detailed design doc, handing the entire spec to Fable 5 in a single opening prompt and letting it run at high effort is the single best-fit usage pattern the model has. And note that this category isn't just for "agent teams" — if you have a finished spec and an ambitious build, you're in it whether you run a fleet of CI-integrated agents or one Claude Code session overnight.

2. High-stakes analysis where error is expensive

Multi-document synthesis with real money or legal exposure attached: M&A due diligence, multi-jurisdictional tax positions, regulatory filings, senior-level financial reasoning (Fable 5 currently tops Hebbia's Finance Benchmark, a test of senior-analyst-level reasoning built by the AI document-analysis firm used by major banks and funds). When a subtle reasoning error costs five or six figures, token price is noise. Fable 5 is also a notably stronger thought partner — more willing to push back, kill its own incorrect beliefs, and reason from first principles. That's exactly the temperament you want reviewing a position you're already invested in.

3. Judge and synthesis roles in multi-agent systems

You rarely need the frontier model everywhere in an agent pipeline. The pattern that works: cheap models fan out, the expensive model decides. Domain-specialist agents, extractors, and verifiers run on Sonnet or Haiku; the judge, executive-synthesis, and cross-domain reasoning agents run on the top tier. Quality of the final output lives disproportionately in the synthesis step.

4. Vision-heavy document and design work

Fable 5 is the new state of the art on vision: extracting precise numbers from dense scientific figures, interpreting charts and tables nested in PDFs, rebuilding application source from screenshots, and — notably for coders — visually critiquing its own output against design goals. Document-heavy finance, legal, and analytics pipelines that previously needed heavy scaffolding need less of it now.

When a lower model serves you better

This list matters more than the previous one, because it covers most of what most teams do all day.

Runbook execution. If the intelligence lives in the runbook — deploy scripts, version-bump checklists, bundle-upload-verify sequences — the model is just following documented steps. That's Sonnet work, and the verification steps (curl the endpoint, grep for the version string) are Haiku work. Paying $50 per million output tokens to execute a procedure your docs already encode was the most common waste pattern in our portfolio.

Scoped maintenance on documented codebases. Bug fixes, small features, and quirk workarounds in mature projects with good CLAUDE.md files and accumulated memory. One real data point: a three-day plugin-maintenance sprint on an Opus-class model cost roughly $700 in compute. At 2× pricing, the same token volume on Fable would have run ~$1,400 — likely somewhat less in practice, given Fable's lower cache threshold and tendency to finish in fewer turns — with no meaningful quality difference, because the constraints were already written down. Sonnet 4.6 handles this class well.

Templated and pattern-following work. Documentation guides that follow an established template with dozens of worked examples, configuration changes, landing pages built from a skill, i18n updates. The pattern is the intelligence.

Anything latency-sensitive or high-volume. Fable 5 is slow. Classification, extraction, chat-style products, interactive coding sessions, and high-throughput pipelines belong on Sonnet or Haiku. This hasn't changed.

Security research and recon-shaped work — for a different reason. Legitimate penetration-test reporting, OSINT footprinting, security audits, and vulnerability triage pattern-match the offensive-cyber classifier. The work is benign; the classifier is conservative by design — launch-week reports of false positives ranged from code reviews to a question about Hermitian matrices. On a managed surface you still get an answer (from Opus 4.8), but if your harness pins Fable 5 on the raw API with no refusal handling, a tripped classifier stops the run. For security-flavored sessions, either run Opus 4.8 directly or make sure your integration handles the refusal gracefully. The same applies to life-sciences work until Anthropic's trusted-access program for biology opens up. (If "but isn't security exactly what Mythos 5 is for?" just crossed your mind — yes, and the next section is for you.)

And if your team's workload is mostly in this section — maintenance, support, content operations, runbook automation — the takeaway is simple: you can skip the frontier tier entirely for now. Route to Sonnet 4.6 by default and you'll capture nearly all the value at a fraction of the cost.

A useful rule of thumb: ask "would a competent person with my docs in front of them find this task hard, or just laborious?" Laborious goes to Sonnet. Hard goes to Fable — and so does laborious-at-a-scale where staying coherent unattended for hours is the hard part, which is why a 50-million-line migration is Fable territory even though no single edit in it is difficult.

Mythos 5 vs. Fable 5: same model, different gates

The naming invites confusion, so let's clear it up: Mythos 5 and Fable 5 are the same underlying model. Anthropic says so explicitly: "the safeguards are what distinguish the two models... and are why we've given them different names." Mythos 5 has safeguards lifted for its vetted audience — cyber today, bio/chem for the upcoming biology cohort — while Fable 5 keeps all the gates and routes gated queries to Opus 4.8. Same weights, same $10/$50 pricing, same 30-day retention. By Anthropic's own data, more than 95% of Fable sessions involve no fallback at all — and in those sessions, "Fable 5's performance is effectively the same as that of Mythos 5." You are not missing out on a smarter model; you're missing out on the gated domains, and even Mythos partners only get the gates their program unlocks.

So who actually gets the ungated version? Today, almost nobody. Mythos 5 is restricted to Project Glasswing partners — a US-government-coordinated cyber-defense program whose roster reads like an infrastructure who's-who (AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA, Palo Alto Networks, plus ~150 more critical-infrastructure organizations). Mythos 5 supersedes Claude Mythos Preview, the April 2026 gated research release that found over ten thousand high- or critical-severity vulnerabilities — at $25/$125 per million tokens, more than double what Mythos 5 costs now. A trusted-access program for biology is opening "in the coming weeks" — note it grants Fable 5 with the bio/chem gates removed (cyber gates intact), not Mythos itself — and a broader systematic application path for cybersecurity organizations is promised but not dated.

This is why our routing table sends security work to Opus 4.8, not Mythos 5. For a security team without Glasswing access — which is to say, nearly every security team — the realistic ladder looks like this:

Opus 4.8 is the working default. It's the model Fable's classifier hands your cyber queries to anyway, so pinning it directly just skips the detour and the split-billing.
Opus 4.8 + the Cyber Verification Program is the self-serve upgrade. The CVP is a free, application-based program (decisions target two business days) that lifts the dual-use cyber blocks — vulnerability exploitation, offensive tooling for defensive purposes — on Opus models for verified organizations. Prohibited-use blocks (ransomware, mass exfiltration) stay regardless; it isn't offered on Bedrock or Vertex; zero-data-retention orgs aren't currently eligible; and note carefully: it's an Opus program. It does not raise Fable 5's thresholds and it does not grant Mythos access.
Fable 5 for everything outside the gated domains — a security company's product engineering, data pipelines, and long-horizon builds are still Fable territory like anyone else's.
Mythos 5 only if you're a Glasswing partner — or, eventually, through the broader cyber application path Anthropic has promised but not dated. If your organization defends critical infrastructure at scale, that's worth watching for; if not, items 1–3 are the menu.

The summary for the impatient: Mythos 5 isn't the model you're missing — it's the permission slip you're missing, and for 95%+ of work the permission slip changes nothing.

How to use Fable 5 effectively

API configuration: know the sharp edges

Fable 5 inherits Opus 4.7/4.8's request surface, but several details differ in ways that matter:

Adaptive thinking only. thinking: {type: "adaptive"}. Fixed budget_tokens returns a 400, as do temperature, top_p, and top_k.
No explicit disabled. Unique to Fable 5: an explicit thinking: {type: "disabled"} returns a 400 (it's accepted on Opus 4.8). Omit the thinking parameter instead, or run adaptive — either way, the model thinks, and you pay for it (see "The bill" below).
Don't ask it to transcribe its reasoning. Prompting Fable 5 to echo its internal chain of thought as response text can trip the reasoning_extraction classifier and refuse the turn. If you need reasoning visibility, set thinking: {type: "adaptive", display: "summarized"} and read the structured thinking blocks instead.
Set generous max_tokens and stream. At xhigh/max effort, give the model ≥64K output room; stream anything large.
Handle the refusal. Check stop_reason: "refusal" and the structured stop_details (category: cyber, bio, reasoning_extraction, or null — see Anthropic's safeguards guide). On Anthropic's own surfaces the fallback to Opus 4.8 is automatic; on the API, server-side fallback is an opt-in beta (and unavailable on Bedrock, Vertex, and Foundry), so decide deliberately what your harness does when a refusal fires. The billing side of this is covered below.
Data retention: Mythos-class traffic carries a mandatory 30-day retention policy (safety-only, not used for training, with logged human access). On some cloud platforms you must explicitly opt in before the model is invocable. Factor this into compliance review before adopting, not after.

The bill: what you're actually paying for

Fable 5's pricing has a sticker and a story. The sticker is simple: 2× Opus. The story is everything that multiplies on top of it — and almost all of it is under your control. Five levers, roughly in order of impact:

1. Effort is the biggest dial on the dashboard. output_config: {effort: ...} with low | medium | high | xhigh | max governs all output — thinking, tool calls, and text. In Willison's single-prompt sweep, max produced roughly 7.5× the output tokens of low on an identical prompt, and high actually used fewer tokens than medium on one run. The relationship between effort and total cost isn't monotonic, because higher effort often finishes in fewer turns. So: start at high (the default), sweep on your own evals, and reserve xhigh/max for hard, latency-insensitive problems. Several of the scariest launch-week burn reports turned out to involve max effort, 1M contexts, and multi-agent workflows all at once — three multipliers, none of them mandatory.

2. You pay for thinking you never see. Adaptive thinking can't be turned off, and you're billed for the model's full internal reasoning, not the (by default, empty) thinking text in the response. If your billed output tokens dwarf your visible output, that's not a bug — check usage.output_tokens_details.thinking_tokens to see the reasoning share. Budget for it; don't fight it.

3. Caching is unusually kind to Fable. Cache reads cost 10% of the input price, standard multipliers apply on writes, and Fable 5's minimum cacheable prefix is 2,048 tokens versus 4,096 on Opus 4.6–4.8 — so mid-sized prompts cache on Fable that silently wouldn't on Opus. At $10/MTok input, verify your cache hits (usage.cache_read_input_tokens) before optimizing anything else.

4. Batch anything that can wait. The Batches API supports Fable 5 at the standard 50% discount: $5/$25, which is interactive-Opus pricing for frontier-quality output. Overnight evals, bulk document analysis, scheduled report generation — if nobody is watching the response stream, it belongs in a batch.

5. Classifier trips have a billing story too. A request refused on input costs $0. With server-side fallback enabled, the fallback's input is billed at cache-read rates. But a mid-stream block split-bills: Fable rates for everything generated before the block, Opus rates after (the fallback billing guide covers the full mechanics). One launch-week report described burning 200K Fable-priced tokens on a code review before the classifier handed the session to Opus. For security- and biology-adjacent work, it's cheaper to start on Opus 4.8 than to pay frontier rates for a run the classifier was always going to interrupt.

Then the guard rails. max_tokens is the only hard cap (thinking and text combined). Task budgets (beta, minimum 20K tokens) give the model a live countdown for an entire agentic loop and it self-moderates — the right pacing tool for long autonomous runs, though it's advisory, not a wall. One caution: set the budget once and let the server count down; mutating it client-side between turns invalidates your prompt cache. Finally, track spend per project. Usage-analytics tooling that breaks cost down by project and agent (Claude Code's /usage and spend limits, or the Console for API orgs) turns the routing decisions in this article from theory into a weekly habit. You'll likely find, as we did, that a handful of long-horizon projects justify the frontier tier and everything else doesn't.

Prompting: front-load the specification

The highest-leverage practice: give the full task specification in one well-specified opening turn, then let the model run. Fable 5's long-horizon coherence comes from planning against a clear goal. Ambiguous asks drip-fed across many turns waste exactly the capability you're paying double for. Concretely:

State what "done" looks like in checkable terms — not "a good report" but "a CSV with a numeric price column per SKU, validated against the source totals."
Include constraints, non-goals, and the verification method up front.
For managed agent platforms, encode "done" as a gradeable rubric/outcome so the harness iterates and grades automatically.

Two behavioral notes carried over from recent Opus generations still apply: the model follows instructions literally (soften any "CRITICAL: YOU MUST" scaffolding written for older, more reluctant models), and it's conservative about reaching for optional capabilities — if you want it using subagents, file-based memory, or custom tools, say when to use them, in the system prompt and in each tool's own description.

Harness and tooling: the model is half the system

Tier your subagents. Even with Fable 5 in the main loop, search/explore/verify subagents should run on Haiku or Sonnet. On tasks where subagent fan-out dominates spend, this approaches a 10× cost difference with near-identical results. (And know your workflow multipliers: multi-agent team sessions run roughly 7× the tokens of a solo session.)
Give it file-based memory on long tasks. Anthropic's own testing showed persistent memory improved Fable 5's long-game performance roughly three times more than it improved Opus 4.8's — this model is distinctly better at writing and using its own notes. Maintain per-project memory files (one lesson per file, corrections and confirmed approaches alike) and the model compounds.
Invest in CLAUDE.md / project instructions. Documented constraints, build commands, and verification procedures are what let a long autonomous run self-correct instead of drifting. Counterintuitively, the better your docs, the less often you need Fable — and the better Fable performs when you do.
Use skills for repeatable procedures. Anything you do more than twice — deploy flows, report formats, review checklists — belongs in a skill the model loads on demand, not in tokens you re-explain per session.
Verify with fresh-context agents, not self-critique. For audits, reviews, and migrations, structure the work as decompose → execute → adversarially verify → synthesize, with verifiers spawned in clean contexts. Fable 5 at the highest effort already reflects on and validates its own work, but independent verification still beats self-review for anything you'll ship.

A simple routing default

Workload	Model
Most work (the default): scoped features, maintenance, templated docs, runbook execution, interactive coding	Sonnet 4.6 (escalate to Opus 4.8 when it stalls)
Multi-day autonomous builds, large migrations, high-stakes synthesis, judge/synthesis agents, frontier vision tasks	Fable 5
Subagent fan-out, verification steps, classification, extraction, high-volume pipelines	Haiku 4.5 (Sonnet 4.6 for heavier subagents)
Security- or biology-flavored sessions where a classifier trip would break your harness	Opus 4.8 directly (+ CVP verification for dual-use cyber work — see the Mythos section)

The bottom line

Fable 5 doesn't make Sonnet or Opus obsolete — it makes a previously impossible class of work possible, at a price and latency that punish using it for everything else. Think of it as hiring a brilliant specialist who bills by the hour and thinks out loud on the clock: transformative on the right problem, ruinous as a receptionist.

The teams that get the most from it will be the ones with the discipline to route: specs and stakes go up to the frontier; procedures and patterns stay on the efficient tiers; verification runs cheap everywhere. The model rewards exactly the engineering hygiene — written specs, documented constraints, checkable definitions of done, persistent memory — that made your systems better before it existed.

Sources

Anthropic, "Claude Fable 5 and Claude Mythos 5" — official announcement (June 9, 2026): capabilities, classifiers, fallback behavior, pricing, availability, data retention.
Anthropic, Project Glasswing and Real-time cyber safeguards on Claude — Mythos access, partner roster, Mythos Preview pricing, and the Cyber Verification Program's scope and application process.
Anthropic platform documentation — pricing, effort parameter, adaptive thinking, task budgets, prompt caching, and the Fable 5 fallback billing guide (refusal/fallback billing mechanics, batch availability).
AWS News Blog, "Anthropic Claude Fable 5 on AWS" — Bedrock/Claude Platform availability, data-retention opt-in, fallback pricing mechanics.
Simon Willison, "Initial impressions of Claude Fable 5" — independent hands-on review: real-world coding sessions, cost data, effort-level token sweep.
Community launch-week reports — r/ClaudeAI thread (144 comments of burn-rate anecdotes and the emerging plan-with-Fable, execute-with-Opus pattern) and Hacker News discussion. Treated as anecdote, not data.
Our portfolio audit (June 2026, including launch-week usage) — routing analysis across ~20 active projects spanning agentic platforms, SDKs, data pipelines, and analysis workloads.

DEV Community