If your Claude Code sessions have been producing more errors, skipping files, or fabricating APIs that don't exist — you're not imagining it.
Over the past two weeks, developers across GitHub, X, and YouTube have reported a measurable decline in Opus 4.6's coding quality. Independent benchmarks now confirm it: the model's hallucination rate has nearly doubled.
This post covers the evidence, the root cause, and the exact settings to fix it.
The Data
BridgeBench Hallucination Benchmark
BridgeBench measures how often AI models fabricate false claims when analyzing code — 30 tasks, 175 questions, verified against ground truth.
Opus 4.6's trajectory:
- Previous: #2 with 83.3% accuracy (~17% fabrication)
- Current: #10 with 68.3% accuracy (33% fabrication)
One in three responses now contains fabricated information.
Current leaderboard (April 14, 2026):
| Model | Accuracy | Fabrication Rate | Rank |
|---|---|---|---|
| Grok 4.20 Reasoning | 91.8% | 10.0% | #1 |
| GPT-5.4 | 86.1% | 16.7% | #2 |
| Claude Opus 4.5 | 72.3% | 27.9% | #6 |
| Claude Sonnet 4.6 | 72.4% | 28.9% | #7 |
| Claude Opus 4.6 | 68.3% | 33.0% | #10 |
Notable: Sonnet 4.6 (smaller, cheaper) outperforms Opus 4.6 on accuracy.
Developer Testing
@om_patel5 ran the same prompt on Opus 4.6 and 4.5:
- 4.6: failed 5 consecutive windows
- 4.5: passed every time
His tweet got 682K views and 1,118 bookmarks. He now runs this as a "quantization canary" before every session.
6,852-Session Analysis
An AMD executive analyzed 6,852 Claude Code sessions and measured a 67% drop in reasoning depth compared to pre-February behavior.
Root Cause: Two Default Changes
Anthropic made two changes in early 2026:
1. Effort level default: high → medium (March 3, 2026)
The model now "conserves thinking" by default. Complex problems that need deep reasoning get classified as "simple enough" and receive shallow analysis.
2. Adaptive thinking introduced (February 9, 2026)
The model dynamically allocates reasoning tokens per turn. Under medium effort, some turns receive zero reasoning tokens — the model answers without thinking at all.
These two changes compound: the model skips thinking precisely when you need it most.
The Fix
Quick fix (per session)
/effort max
Permanent fix (environment variables)
export CLAUDE_CODE_EFFORT_LEVEL=max
export CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
Add these to your .bashrc or .zshrc.
Nuclear option: switch to Opus 4.5
Set model to claude-opus-4-5-20251101. Slower and more expensive, but consistently reliable.
Quick Reference
| Problem | Fix | Command |
|---|---|---|
| Session feels dumb | Max effort | /effort max |
| Resets every session | Env var | CLAUDE_CODE_EFFORT_LEVEL=max |
| Zero-reasoning turns | Disable adaptive | CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 |
| Still unreliable | Use Opus 4.5 | Model: claude-opus-4-5-20251101
|
Model Switching in Production
If you're calling Claude via API in production, switching models means changing endpoints, auth, and billing for each provider. A unified API gateway like EvoLink lets you swap between 30+ models by changing one parameter. The Smart Router (evolink/auto) can automatically route deep-reasoning tasks to more reliable models.
Sources:
- BridgeBench Hallucination Benchmark — bridgebench.ai/hallucination
- @om_patel5 on X (Apr 10, 2026, 682K+ views)
- GitHub Issue #42796 — github.com/anthropics/claude-code/issues/42796
- Digit.in — AMD executive's 6,852-session analysis
- pasqualepillitteri.it — effort/adaptive thinking configuration guide
Top comments (0)