tokenmixai

Posted on Jun 10 • Originally published at tokenmix.ai

Claude Fable 5 for Developers: API Changes, Pricing, Migration Notes

#ai #anthropic #claude #api

Anthropic shipped Claude Fable 5 on June 9, 2026 — its first generally available Mythos-class model, priced at $10 per million input tokens and $50 per million output. That is exactly double Claude Opus 4.8, and the benchmark deltas are real: SWE-Bench Pro 80.3% vs 69.2%, FrontierCode 29.3% vs 13.4%.

But the price is not the migration story. The API behavior is. Fable 5 ships three breaking changes that will silently misbehave in any integration that assumes Opus-era semantics. This post covers what actually changes in your code, what the bill looks like, and where the traps are.

I run model intelligence at TokenMix, where we track pricing and API behavior across 300+ models. Everything below is sourced from Anthropic's launch docs, migration guide, and pricing page — verified June 10, 2026.

The 60-second version

Price: $10/$50 per MTok. Every rate is exactly 2× Opus 4.8 — cache reads $1, 5-min cache writes $12.50, 1-hour writes $20, batch $5/$25.
Specs: 1M context, 128K max output, no long-context surcharge.
Model ID: claude-fable-5 on the Claude API; anthropic.claude-fable-5 on Bedrock; anthropic/claude-fable-5 on OpenRouter.
Breaking change 1: Adaptive thinking is always on. thinking: {"type": "disabled"} returns an error.
Breaking change 2: Refusals are HTTP 200 responses with stop_reason: "refusal" — not error codes.
Breaking change 3: Safety classifiers reroute flagged requests to Opus 4.8 (under 5% of sessions), and rerouted requests bill at Opus rates.
No ZDR: 30-day data retention is mandatory. Zero-data-retention accounts don't see the model at all.

Breaking change 1: thinking is no longer optional

On Opus 4.8 you could disable thinking to trade quality for latency. On Fable 5 you cannot — adaptive thinking is permanently on, and the model decides how much to think per request.

Your replacement lever is the effort parameter:

{
  "model": "claude-fable-5",
  "max_tokens": 16000,
  "effort": "high",
  "messages": [...]
}

Five levels: low, medium, high, xhigh, max. Default is high. Anthropic's migration guide is explicit: start at high even for workloads that ran xhigh on Opus 4.8 — Fable 5 reaches further per unit of thinking.

Two gotchas:

max_tokens now caps thinking + response combined. A workload that ran thinking-off on Opus 4.8 inherits always-on thinking here. Output budgets sized for bare responses will truncate. Resize them.
Raw chain-of-thought is never returned. thinking.display defaults to "omitted"; set it to "summarized" if you want readable summaries. In multi-turn conversations, pass thinking blocks back unchanged.

Prefill, manual thinking budgets, and sampling parameters are still rejected with 400 — unchanged from Opus 4.7/4.8, so nothing new breaks there.

Breaking change 2: refusals look like success

This is the integration trap. A refused request returns HTTP 200 with:

{
  "stop_reason": "refusal",
  "stop_details": { "category": "cyber" }
}

stop_details.category is one of "cyber", "bio", "reasoning_extraction", or null. Anything keyed on HTTP status codes treats this as a normal completion and passes a declined response downstream. Check stop_reason on every Fable 5 response.

Billing on refusals:

Refused before any output → $0
Classifier fires mid-stream → input plus already-streamed output is billed; discard the partial output

Breaking change 3: the Opus 4.8 fallback

Fable 5 is the same underlying model as Claude Mythos 5 (the Glasswing-partners-only variant) with safety classifiers active. When a classifier flags a request — offensive cyber, bioweapon-adjacent biology, or distillation-style extraction patterns — the response is served by Opus 4.8 instead, and bills at Opus rates ($5/$25).

Anthropic reports under 5% of sessions trigger this. The beta fallbacks parameter automates retry server-side, but only on the Claude API and Claude Platform on AWS. On the Batch API, Bedrock, Vertex, and Foundry, retries run client-side via SDK middleware (TypeScript, Python, Go, Java, C#).

One pattern worth flagging from the Claude Code docs: fallback can fire on the first request of a session, before you type anything, because that request carries workspace context — CLAUDE.md content, directory names, git status. A repo full of security tooling can trip the classifier on context alone. claude --safe-mode strips customizations to diagnose it.

And the false-positive reports are already in: the Hacker News launch thread has developers reporting MRI brain-segmentation code and mosquito-malaria research flagged as bio risks. If your domain is health-adjacent, meter your first week.

The pricing table that matters

Rate	Fable 5	Opus 4.8	Multiple
Base input	$10.00	$5.00	2.0×
5-min cache write	$12.50	$6.25	2.0×
1-hour cache write	$20.00	$10.00	2.0×
Cache read	$1.00	$0.50	2.0×
Output	$50.00	$25.00	2.0×
Batch input	$5.00	$2.50	2.0×
Batch output	$25.00	$12.50	2.0×
Min cacheable prompt	512 tokens	1,024 tokens	Fable caches shorter prompts

Three footnotes that change real bills:

No long-context surcharge. Per Anthropic's pricing docs, "a 900k-token request is billed at the same per-token rate as a 9k-token request." Gemini 3.1 Pro doubles its input rate past 200K; Fable 5 doesn't.
Tokenizer. Fable 5 uses the Opus 4.7 tokenizer — roughly 30% (up to 35%) more tokens from the same text vs pre-4.7 models. Comparisons against Opus 4.8 are apples-to-apples; against your old 4.5-era bills, they are not.
No fast mode. Opus 4.8 fast mode costs the same $10/$50 as Fable 5 — the same sticker price buys speed or intelligence, pick one.

Is 2× worth it? The cost-per-solve math

Raw per-attempt cost on a 100K-in / 20K-out agentic task: Fable $2.00, Opus $1.00. Now divide by published pass rates:

Difficulty tier	Fable 5	Opus 4.8	GPT-5.5
SWE-Bench Pro tier (routine-hard)	$2.49	$1.45	$1.88
FrontierCode tier (frontier-hard)	$6.83	$7.46	$19.30

On routine work, Opus 4.8 wins per solved task. On frontier-hard work, Opus fails often enough that retries eat the savings and Fable becomes the cheapest per solve. Route by task difficulty, not by loyalty to a price point.

Field reports from the HN thread cut both ways: several developers report Fable finishing in fewer turns with "more targeted and surgical diffs" — one claims comparable results with about half the tokens, which would put effective cost near Opus parity. Another metered $82.92 in API-equivalent usage in a single day on a Max plan. The variance is the takeaway.

Migration checklist

Swap model ID to claude-fable-5 (or run /claude-api migrate in Claude Code — it automates the parameter changes too).
Remove any thinking: {"type": "disabled"} — it errors now.
Resize max_tokens for thinking + response combined.
Add a stop_reason === "refusal" check; read stop_details.category.
Decide your fallback story: fallbacks param (Claude API / AWS) or SDK middleware (everywhere else).
Audit for ZDR conflicts — Covered Model status means mandatory 30-day retention, no workaround.
Set effort: "high" and only escalate to xhigh/max with eval evidence.

FAQ

Can I disable thinking on Claude Fable 5?

No. Adaptive thinking is permanently on and thinking: {"type": "disabled"} returns an error. Use the effort parameter (low through max) to control thinking depth, and remember max_tokens caps thinking plus response combined.

What does `stop_reason: "refusal"` mean?

A safety classifier declined the request — it is a successful HTTP 200 response, not an error. stop_details.category names the classifier: "cyber", "bio", "reasoning_extraction", or null. Refusals with no output are free.

Does Claude Fable 5 work in Claude Code?

Yes — /model fable on v2.1.170+. It is never the default, and it is hidden entirely under zero-data-retention accounts. Flagged requests re-run on Opus 4.8 with a transcript notice.

Is Fable 5 on Bedrock and Vertex?

Yes, GA since June 9: anthropic.claude-fable-5 on Bedrock (global. prefix on the global endpoint; the cache minimum stays 1,024 tokens there), claude-fable-5 on Vertex AI and Microsoft Foundry. OpenRouter lists it at pass-through $10/$50. Note the fallbacks parameter is not available on Bedrock/Vertex/Foundry — use SDK middleware.

Should I migrate everything from Opus 4.8?

No. The cost-per-solve math says route the frontier-hard 10-20% of your workload to Fable 5 and keep routine traffic on Opus 4.8 or Sonnet 4.6. Fable loses on routine-task economics, interactive latency, and ZDR compliance.

Full review with benchmark tables, the Mythos 5 / Project Glasswing context, and the monthly-bill math: Claude Fable 5 Review 2026: Pricing, Benchmarks, vs Opus 4.8

Top comments (1)

Resoluciones • Jun 18

An AI safety refusal returning as a successful HTTP 200 is peak tech gaslighting. My integration is going to treat a polite 'I cannot fulfill this request' as a valid production output and pass it straight downstream to the client. When you couple that with the 5% chance of being silently rerouted to Opus 4.8 because your context looks suspicious, doing this inside a standard browser interface is pure financial roulette. Shifting to a native desktop workspace where you can handle middleware validation locally on disk and force strict token gates before hitting the endpoint isn't just a cleaner setup anymore—it's essential insurance against your own stack.