Behavior described as of June 2026 — Anthropic tunes these classifiers, so details may change. Sources at the end.
Short answer: Claude Fable 5 runs safety classifiers (mainly offensive-cybersecurity and biology). When one flags a request, Claude Code re-runs it on Opus and the session stays on Opus. The classifier reads everything the model reads — including your auto-loaded CLAUDE.md — so a session can fall back on its first message, before you type anything. For legitimate work the most durable fix is workspace hygiene: keep auto-loaded docs neutral and architecture-only, and move mechanism-heavy domain text into files that aren't auto-loaded.
What this is — and isn't. This is about removing false positives for legitimate work. We hit it ourselves: our own project docs had accumulated combat metaphors and security-flavored wording as a plain writing reflex ("attack the problem", "kill the stale process"), a classifier read that standing language as signal, and the honest fix was cleaning our own language. If your work genuinely is offensive security or biology, the fallback is expected, documented routing — rewording is not a way around it.
The symptom
- You select Fable 5 (
/model fable), and mid-session — or on the first message — a notice appears and responses start coming from Opus 4.8. - The switch is sticky: the picker stays on Opus for the rest of the conversation.
- Switching back with
/model fableoften re-triggers, because the flagged content is still in context.
This matches public reports — GitHub issues #66670, #66916, #67246 describe benign sessions (startup code review, grant applications, normal engineering) being switched.
Why it happens
Per Anthropic's help article and the Claude Code docs:
- Fable 5 ships automated safety checks for a few categories — offensive cybersecurity and biology/life-science are the two that matter for most developers.
- The checks review "everything the model reads, not just your latest message — including memory, content from connectors, web search results, and files."
- The docs say it directly: "Fallback can trigger on the first request of a session … because the first request carries workspace context such as your CLAUDE.md content and git status."
- The checks are "intentionally broad" (Anthropic's wording) — which is why false positives on legitimate work happen.
The part most teams miss: the trigger is often not what you do but how your standing docs talk. Engineering writing drifts toward combat metaphor — attack, kill, hit the target, defend the perimeter — and mechanism shorthand from security/biology ("immune layer", "honeypot", "payload"). Each instance is innocent; accumulated across a CLAUDE.md that ships with every session's first message, it reads like signal.
Recovering when it happens
- Fastest: start a fresh session and re-select Fable. A flagged conversation keeps its flagged context; fighting it costs more than starting clean.
-
Diagnose: run
claude --safe-mode— it disables customizations (CLAUDE.md, skills, MCP servers, hooks). If fallback stops, your trigger is in those files. (Git status + directory names are still included.) -
Take control of the switch: run
/configand turn off "switch models when a message is flagged". A flag then pauses the session and offers: switch to Opus, or edit the prompt and retry on Fable. -
/model fableswitches back any time, with the re-trigger caveat above.
The settings.json gotcha: sessions silently starting on the wrong model
A related but different failure: Claude Code keeps starting on your tier's default (Opus/Sonnet) even though you saved Fable. Since v2.1.153, /model writes your choice into the model field of ~/.claude/settings.json — so check what actually landed there:
python -c "import json,pathlib;print(repr(json.load(open(pathlib.Path.home()/'.claude'/'settings.json'))['model']))"
If the value is anything other than a clean id/alias — stray terminal-escape characters, or a suffix on an id that doesn't support it (the [1m] 1M-context suffix is documented for opus/sonnet) — Claude Code may not recognize it and quietly falls through to the default. We hit a corrupted value of exactly this shape. The fix is one line, by hand:
{ "model": "claude-fable-5" }
Then verify at the next session start (the active model shows in /status).
Prevention: workspace hygiene
Four practices that removed our false positives, in order of leverage:
-
Keep auto-loaded docs architecture-only.
CLAUDE.mdshould carry file layout, names, commands, status — not domain mechanism. If your project legitimately touches a sensitive-sounding domain (health data, defensive security, lab-adjacent tooling), move the mechanism narrative into a separate doc that is not auto-loaded (e.g.DOMAIN.md) and reference it by name. - Result-language over mechanism-language. Say what the system produces, not how the sensitive part works. The honest content is unchanged — the framing names outcomes.
- Neutral verbs over combat metaphor. Approach the problem, stop the process, reach the audience. Reads better to humans too — the metaphors were never load-bearing.
-
Run a checker before it bites. We generalized the script we used on our own workspace into a small, dependency-free checker — it scans your
CLAUDE.mdfiles for flag-prone standing language (protecting code spans, filenames, identifiers), sanity-checks yoursettings.jsonmodel id, and prints suggestions. It never modifies your files:
curl -O https://tagmac.dev/tools/claudemd-hygiene-check.py
python claudemd-hygiene-check.py
Python 3 stdlib only, exit code 1 on findings (slots into CI or a pre-commit hook).
Sources
- Anthropic Help Center — Why Claude switched models in your conversation with Fable 5
-
Claude Code docs — Model configuration (automatic fallback,
--safe-mode,/configtoggle,[1m]suffix,/modelpersistence) - Public reports: claude-code#66670 · #66916 · #67246
Originally published on tagmac.dev. We run every practice here on our own operations first.
Top comments (0)