DevToolsPicks

Posted on May 5 • Originally published at devtoolpicks.com

Claude Opus 4.7 Is a Regression: Why Developers Are Switching Back to 4.6

#aicodingtools #claudecode #indiehacker #developertools

Originally published at devtoolpicks.com

When Claude Opus 4.7 launched on April 16, Anthropic positioned it as their most capable generally available model. SWE-bench numbers improved. Agentic persistence got better. The marketing was clean.

Three weeks later, a different story is everywhere. Reddit threads titled "Opus 4.7 is a genuine regression and I'm tired of pretending it isn't" are getting widespread agreement. A New Stack article from April 19 used the term "AI shrinkflation." On X, "Developers Call Anthropic's Claude Opus 4.7 Unusable for Coding" started trending across developer feeds.

This isn't one bad day on r/ClaudeCode. The complaints are specific, repeatable, and serious enough that developers are downgrading their Claude Code workflows back to Opus 4.6. Here's what's actually happening, why it's happening, and how to switch back if you're hitting the same wall.

What Developers Are Actually Reporting

The complaints cluster around three patterns. Each one is consistent across Reddit, Hacker News, and X.

Argues with users to the point of hallucination. This is the most-cited complaint. Developers report that Opus 4.7 pushes back on corrections instead of executing them. You point out a bug, and the model defends its original code. You ask it to make a change, and it explains why your change is wrong. On simple tasks where 4.6 just did the thing, 4.7 hedges, debates, and sometimes invents reasons not to comply. One developer summed it up: "I want it to write code. It wants to discuss whether the code should exist."

Proof work spirals on complex reasoning. PhD students and researchers describe 4.7 cycling through "oh wait, that doesn't work, let me try again" five times in a single response on theoretical math and physics work that 4.6 handled cleanly a month ago. The model gets stuck in self-correction loops that produce verbose output without resolution.

Drops in long-context retrieval. On the NYT Connections extended benchmark (a reasoning test not designed around Anthropic's models), Opus 4.7 scored 41.0%. Opus 4.6 scored 94.7%. That's a 54-point drop on a benchmark Anthropic didn't tune for. The pattern shows up in real work too: 4.7 forgets earlier instructions, contradicts itself across long sessions, and loses track of multi-file context that 4.6 handled.

The benchmarks Anthropic published show improvement. The benchmarks users run on their own work show regression. Both are real. They're measuring different things.

What Actually Changed

Three technical changes shipped with Opus 4.7. The first two are documented. The third is inferred.

New tokenizer that costs 12 to 18 percent more on English workloads. Anthropic redesigned the tokenizer to improve multilingual handling. For non-Latin scripts, this is a 20-35% efficiency gain. For English (which is what most of you are paying for), token counts went up. The price per million tokens didn't change, but you're paying for more tokens per task. Effectively, this is a 12-18% price increase wearing a feature label.

budget_tokens parameter now returns a 400 error. If your code uses Anthropic's API with thinking={"type": "enabled", "budget_tokens": N}, that breaks on Opus 4.7. The migration guide explains the fix, but most developers don't read migration guides. The result: production code that worked on 4.6 throws errors on 4.7 until manually patched.

Likely quantization for capacity. This is the part Anthropic hasn't said publicly. The most plausible explanation for the quality regression on identical tasks is that 4.7 is running at lower precision than 4.6 was at launch. Industry analysts including The New Stack have called this "AI shrinkflation": same nominal model, less actual compute per query, in service of meeting demand. OpenClaw usage has surged in 2026. Anthropic needs to serve more queries per GPU. Quantization and aggressive system prompt compression are the standard ways to do that, and the loss of fidelity is the price.

This part is speculation, but it's consistent with the evidence. The model behaves differently on identical inputs. That only happens when something changed in how the model runs.

Why This Matters for Indie Hackers

If you're using Claude Code daily for a SaaS or side project, three things follow from this.

Your token bill is higher than it looks. Even if your daily prompt is identical, you're paying 12-18% more on Opus 4.7 due to the tokenizer change. If you've been hitting subscription limits faster since mid-April, this is part of why. Combined with the runaway agent billing risks we covered yesterday, the cost picture for Claude Code in 2026 is harder to predict than it was in 2025.

Your existing prompts may produce worse output. If you have system prompts and CLAUDE.md files tuned for 4.6, they may need adjustment for 4.7. The model interprets instructions more conservatively and pushes back more often. Prompts that worked cleanly six weeks ago may now produce hedged, verbose, or argumentative responses.

The "always use the newest model" pattern is broken. For two years, the implicit rule was: when a new Claude model ships, switch immediately. That rule is no longer reliable. Opus 4.7 is better at some things and demonstrably worse at others. Your specific workflow determines which side of the line you fall on.

This isn't unique to Anthropic. The Claude Code quality drop earlier this year was a similar story: a model in production behaving differently than it did at launch. Anthropic has a recurring issue with mid-cycle drift, and Opus 4.7 looks like a more visible version of the same pattern.

How to Switch Back to Opus 4.6

The good news: Opus 4.6 is still available. Anthropic typically maintains older models for several months after a new release. You can switch back today.

In Claude Code: Use the --model flag with the explicit version:

claude --model claude-opus-4-6

Or set it permanently in ~/.claude/settings.json:

{
  "model": "claude-opus-4-6"
}

In the Anthropic API: Specify claude-opus-4-6 in your model parameter. If you're using the latest SDK, this should just work without other changes.

In ChatGPT-style interfaces: Most third-party Claude interfaces (Cursor, Continue, etc.) let you choose the model in settings. Look for "Anthropic Claude Opus 4.6" specifically. If only "Claude Opus" is shown, you're probably routing to 4.7 by default.

Check your billing. After switching, watch your token consumption for the next few days. You should see token counts drop 12-18% on identical workloads. If they don't, your client may still be using the new tokenizer behind the scenes.

Anthropic's Response So Far

As of May 5, Anthropic has not publicly addressed the regression complaints in a coordinated way. Boris Cherny, head of Claude Code, posted a thread on April 17 about how to "get the most out of Opus 4.7" with six tips for prompting. The thread didn't mention the tokenizer change, the breaking API parameter, or the user complaints already accumulating.

A Reddit thread titled "Anthropic: Can you adjust and not deprecate Opus 4.6 as per your usual schedule?" is gaining traction. Developers are explicitly asking Anthropic to keep 4.6 available longer than the typical deprecation window. No official response yet.

This silence is part of the frustration. When the Claude Code quality issue surfaced earlier, Anthropic eventually published a post-mortem. With Opus 4.7, three weeks in, there's no equivalent acknowledgment. That may change. Or it may not.

The Bigger Question: Should You Switch AI Coding Tools?

If Opus 4.7 is unreliable for your workflow and 4.6 might be deprecated, the long-term question becomes: stay on Anthropic, or move to a competitor?

The honest answer depends on what you're building. Claude is still strong for code generation, agentic loops, and complex refactors when 4.6 is available. But the model lock-in concern is real. If Anthropic deprecates 4.6 and 4.7 stays in its current state, you're stuck.

Codex with the new /goal command is the most direct alternative for terminal-based agentic coding. GPT-5.4 is stronger on web research and source synthesis (the exact areas where 4.7 regressed). Cursor and Windsurf both let you swap models at the editor level if you want flexibility without committing to one provider. For a full breakdown, see Codex vs Claude Code.

For most indie hackers right now, the smart move is keeping multiple options ready. Use 4.6 in Claude Code while you can. Have a Codex setup tested for when you can't. Don't bet your stack on one model behaving the same way next month as it does today.

FAQ

Is Opus 4.7 actually worse, or is this just user complaint bias?

Both real and measurable. Anthropic's own benchmarks show improvement on coding (SWE-bench up). User benchmarks (NYT Connections, custom regression tests) show drops on reasoning. The two coexist because they measure different things. Anthropic optimized for what they tested. Users are testing what Anthropic didn't.

How long will Opus 4.6 stay available?

Anthropic typically supports older models for several months after a new release. Claude 3 Opus, Sonnet 3.5, and Haiku 3 all had multi-month deprecation windows. There's no announced deprecation date for 4.6 yet, but expect a notice in the next 1-3 months. Plan accordingly.

Did the tokenizer really make API costs go up?

Yes, in practice. The price per token didn't change, but English workloads use 12-18% more tokens on the new tokenizer. If your monthly Anthropic bill went up since April 16 with no change in usage, this is the most likely cause. Multilingual users see the opposite effect.

Is this fixable in prompting, or is the model itself broken?

Some users have reduced the regression by adding explicit "execute, do not debate" framing in system prompts and using effort: "standard" for routine tasks. This helps with the arguing-back behavior. It doesn't fix the long-context retrieval drop or the proof-work spiraling on complex reasoning, which appear to be model-level changes.

Will Anthropic acknowledge this publicly?

Unknown. Their pattern has been to address issues via Boris Cherny's prompt-engineering threads rather than direct admissions. If user pressure continues, an official post-mortem is possible. Watch the Anthropic blog and Cherny's X account for any update.

DEV Community