DEV Community

tokenmixai
tokenmixai

Posted on • Originally published at tokenmix.ai

Claude Fable 5 for Developers: API Changes, Pricing, Migration Notes


Anthropic shipped Claude Fable 5 on June 9, 2026 — its first generally available Mythos-class model, priced at $10 per million input tokens and $50 per million output. That is exactly double Claude Opus 4.8, and the benchmark deltas are real: SWE-Bench Pro 80.3% vs 69.2%, FrontierCode 29.3% vs 13.4%.

But the price is not the migration story. The API behavior is. Fable 5 ships three breaking changes that will silently misbehave in any integration that assumes Opus-era semantics. This post covers what actually changes in your code, what the bill looks like, and where the traps are.

I run model intelligence at TokenMix, where we track pricing and API behavior across 300+ models. Everything below is sourced from Anthropic's launch docs, migration guide, and pricing page — verified June 10, 2026.

The 60-second version

  • Price: $10/$50 per MTok. Every rate is exactly 2× Opus 4.8 — cache reads $1, 5-min cache writes $12.50, 1-hour writes $20, batch $5/$25.
  • Specs: 1M context, 128K max output, no long-context surcharge.
  • Model ID: claude-fable-5 on the Claude API; anthropic.claude-fable-5 on Bedrock; anthropic/claude-fable-5 on OpenRouter.
  • Breaking change 1: Adaptive thinking is always on. thinking: {"type": "disabled"} returns an error.
  • Breaking change 2: Refusals are HTTP 200 responses with stop_reason: "refusal" — not error codes.
  • Breaking change 3: Safety classifiers reroute flagged requests to Opus 4.8 (under 5% of sessions), and rerouted requests bill at Opus rates.
  • No ZDR: 30-day data retention is mandatory. Zero-data-retention accounts don't see the model at all.

Breaking change 1: thinking is no longer optional

On Opus 4.8 you could disable thinking to trade quality for latency. On Fable 5 you cannot — adaptive thinking is permanently on, and the model decides how much to think per request.

Your replacement lever is the effort parameter:

{
  "model": "claude-fable-5",
  "max_tokens": 16000,
  "effort": "high",
  "messages": [...]
}
Enter fullscreen mode Exit fullscreen mode

Five levels: low, medium, high, xhigh, max. Default is high. Anthropic's migration guide is explicit: start at high even for workloads that ran xhigh on Opus 4.8 — Fable 5 reaches further per unit of thinking.

Two gotchas:

  1. max_tokens now caps thinking + response combined. A workload that ran thinking-off on Opus 4.8 inherits always-on thinking here. Output budgets sized for bare responses will truncate. Resize them.
  2. Raw chain-of-thought is never returned. thinking.display defaults to "omitted"; set it to "summarized" if you want readable summaries. In multi-turn conversations, pass thinking blocks back unchanged.

Prefill, manual thinking budgets, and sampling parameters are still rejected with 400 — unchanged from Opus 4.7/4.8, so nothing new breaks there.

Breaking change 2: refusals look like success

This is the integration trap. A refused request returns HTTP 200 with:

{
  "stop_reason": "refusal",
  "stop_details": { "category": "cyber" }
}
Enter fullscreen mode Exit fullscreen mode

stop_details.category is one of "cyber", "bio", "reasoning_extraction", or null. Anything keyed on HTTP status codes treats this as a normal completion and passes a declined response downstream. Check stop_reason on every Fable 5 response.

Billing on refusals:

  • Refused before any output → $0
  • Classifier fires mid-stream → input plus already-streamed output is billed; discard the partial output

Breaking change 3: the Opus 4.8 fallback

Fable 5 is the same underlying model as Claude Mythos 5 (the Glasswing-partners-only variant) with safety classifiers active. When a classifier flags a request — offensive cyber, bioweapon-adjacent biology, or distillation-style extraction patterns — the response is served by Opus 4.8 instead, and bills at Opus rates ($5/$25).

Anthropic reports under 5% of sessions trigger this. The beta fallbacks parameter automates retry server-side, but only on the Claude API and Claude Platform on AWS. On the Batch API, Bedrock, Vertex, and Foundry, retries run client-side via SDK middleware (TypeScript, Python, Go, Java, C#).

One pattern worth flagging from the Claude Code docs: fallback can fire on the first request of a session, before you type anything, because that request carries workspace context — CLAUDE.md content, directory names, git status. A repo full of security tooling can trip the classifier on context alone. claude --safe-mode strips customizations to diagnose it.

And the false-positive reports are already in: the Hacker News launch thread has developers reporting MRI brain-segmentation code and mosquito-malaria research flagged as bio risks. If your domain is health-adjacent, meter your first week.

The pricing table that matters

Rate Fable 5 Opus 4.8 Multiple
Base input $10.00 $5.00 2.0×
5-min cache write $12.50 $6.25 2.0×
1-hour cache write $20.00 $10.00 2.0×
Cache read $1.00 $0.50 2.0×
Output $50.00 $25.00 2.0×
Batch input $5.00 $2.50 2.0×
Batch output $25.00 $12.50 2.0×
Min cacheable prompt 512 tokens 1,024 tokens Fable caches shorter prompts

Three footnotes that change real bills:

  1. No long-context surcharge. Per Anthropic's pricing docs, "a 900k-token request is billed at the same per-token rate as a 9k-token request." Gemini 3.1 Pro doubles its input rate past 200K; Fable 5 doesn't.
  2. Tokenizer. Fable 5 uses the Opus 4.7 tokenizer — roughly 30% (up to 35%) more tokens from the same text vs pre-4.7 models. Comparisons against Opus 4.8 are apples-to-apples; against your old 4.5-era bills, they are not.
  3. No fast mode. Opus 4.8 fast mode costs the same $10/$50 as Fable 5 — the same sticker price buys speed or intelligence, pick one.

Is 2× worth it? The cost-per-solve math

Raw per-attempt cost on a 100K-in / 20K-out agentic task: Fable $2.00, Opus $1.00. Now divide by published pass rates:

Difficulty tier Fable 5 Opus 4.8 GPT-5.5
SWE-Bench Pro tier (routine-hard) $2.49 $1.45 $1.88
FrontierCode tier (frontier-hard) $6.83 $7.46 $19.30

On routine work, Opus 4.8 wins per solved task. On frontier-hard work, Opus fails often enough that retries eat the savings and Fable becomes the cheapest per solve. Route by task difficulty, not by loyalty to a price point.

Field reports from the HN thread cut both ways: several developers report Fable finishing in fewer turns with "more targeted and surgical diffs" — one claims comparable results with about half the tokens, which would put effective cost near Opus parity. Another metered $82.92 in API-equivalent usage in a single day on a Max plan. The variance is the takeaway.

Migration checklist

  1. Swap model ID to claude-fable-5 (or run /claude-api migrate in Claude Code — it automates the parameter changes too).
  2. Remove any thinking: {"type": "disabled"} — it errors now.
  3. Resize max_tokens for thinking + response combined.
  4. Add a stop_reason === "refusal" check; read stop_details.category.
  5. Decide your fallback story: fallbacks param (Claude API / AWS) or SDK middleware (everywhere else).
  6. Audit for ZDR conflicts — Covered Model status means mandatory 30-day retention, no workaround.
  7. Set effort: "high" and only escalate to xhigh/max with eval evidence.

FAQ

Can I disable thinking on Claude Fable 5?

No. Adaptive thinking is permanently on and thinking: {"type": "disabled"} returns an error. Use the effort parameter (low through max) to control thinking depth, and remember max_tokens caps thinking plus response combined.

What does stop_reason: "refusal" mean?

A safety classifier declined the request — it is a successful HTTP 200 response, not an error. stop_details.category names the classifier: "cyber", "bio", "reasoning_extraction", or null. Refusals with no output are free.

Does Claude Fable 5 work in Claude Code?

Yes — /model fable on v2.1.170+. It is never the default, and it is hidden entirely under zero-data-retention accounts. Flagged requests re-run on Opus 4.8 with a transcript notice.

Is Fable 5 on Bedrock and Vertex?

Yes, GA since June 9: anthropic.claude-fable-5 on Bedrock (global. prefix on the global endpoint; the cache minimum stays 1,024 tokens there), claude-fable-5 on Vertex AI and Microsoft Foundry. OpenRouter lists it at pass-through $10/$50. Note the fallbacks parameter is not available on Bedrock/Vertex/Foundry — use SDK middleware.

Should I migrate everything from Opus 4.8?

No. The cost-per-solve math says route the frontier-hard 10-20% of your workload to Fable 5 and keep routine traffic on Opus 4.8 or Sonnet 4.6. Fable loses on routine-task economics, interactive latency, and ZDR compliance.


Full review with benchmark tables, the Mythos 5 / Project Glasswing context, and the monthly-bill math: Claude Fable 5 Review 2026: Pricing, Benchmarks, vs Opus 4.8

Top comments (0)