DevToolsPicks

Posted on Apr 24 • Originally published at devtoolpicks.com

Anthropic Explains the Claude Code Quality Drop: Here Is What Actually Happened

#aicodingtools #developertools #indiehacker #claudecode

Originally published at devtoolpicks.com

If Claude Code felt slower, more forgetful, or less capable over the last six weeks, you were not imagining it. On April 23, 2026, Anthropic published an engineering postmortem confirming that three separate product-layer changes shipped between March and April had degraded Claude Code's performance for a significant portion of users.

The fixes are live as of April 20 in version 2.1.116. Usage limits were reset for all subscribers on April 23. The model weights themselves were never changed. Here is what actually happened.

The three things that broke

1. Reasoning effort dropped from high to medium (March 4)

When Anthropic shipped Claude Opus 4.6 to Claude Code in February, they set the default reasoning effort to high. High reasoning means the model thinks longer before responding. Longer thinking generally produces better output, but it also causes the Claude Code UI to appear frozen while the model works through a problem.

User feedback came in quickly: sessions felt like they were hanging. The interface looked stuck. Token usage spiked unexpectedly.

On March 4, Anthropic changed the default reasoning effort from high to medium. The intention was reasonable: reduce latency, make the UI feel more responsive. The result was a noticeable drop in output quality on complex tasks, particularly multi-step coding work. Users reported that Claude Code was choosing the simplest fix rather than the correct one, reasoning less deeply, and producing shallower analysis.

Anthropic reverted this change on April 7. Reasoning effort is back to high.

2. A caching bug made Claude forget mid-session (March 26)

Claude Code caches input tokens for up to an hour to make sequential API calls faster and cheaper. On March 26, Anthropic shipped a change meant to clear old thinking blocks from sessions that had been idle for more than an hour. The idea was sensible: if you came back to a session after an hour away, the old thinking was no longer relevant and would just waste context.

The bug: instead of clearing old thinking once when a session resumed after an idle period, the code cleared it on every single turn for the rest of the session. The practical effect was that Claude would start each turn having forgotten everything it had reasoned through in the previous turn. It looked like memory loss. Users saw repetitive responses, Claude re-asking questions it had already answered, and errors in tasks that required holding context across multiple steps.

Stella Laurenzo, a Senior Director at AMD's AI group, published a detailed audit of 6,852 Claude Code session files and 234,000 tool calls that documented the regression. She found that the number of files Claude read before editing had dropped from 6.6 to 2.0. The data was hard to dispute.

Anthropic fixed the caching bug on April 10 in version 2.1.101.

3. A verbosity-limiting system prompt hurt coding quality (April 16)

On April 16, alongside the Opus 4.7 release, Anthropic added a line to Claude Code's system prompt to reduce how much Claude wrote between tool calls. The exact instruction was: "Length limits: keep text between tool calls to 25 words or less. Keep final responses to 100 words or less unless the task requires more detail."

The goal was to make Claude less verbose. Claude Code had a reputation for generating long explanations when users just wanted code. The change tested fine in internal evaluations across multiple weeks. No regressions appeared.

Then Anthropic ran a wider set of ablation tests, which remove individual system prompt lines to measure each line's specific effect. The verbosity instruction caused a 3% drop in coding performance for both Opus 4.6 and Opus 4.7. Combined with the other changes already in effect, the aggregate impact was significant.

Anthropic reverted the system prompt on April 20 in version 2.1.116.

Why it was so hard to find

Each of the three changes affected a different slice of users on a different timeline. The reasoning effort change hit heavy users of complex tasks. The caching bug hit anyone who left sessions idle for over an hour before returning. The verbosity prompt hit a broader set of coding tasks but with a smaller effect size.

When you combine three separate regressions affecting different users in different ways at different times, the overall signal looks like general, inconsistent degradation rather than a specific bug. It is hard to isolate. Claude Code lead Boris Cherny described it as "probably the most complex investigation we've had" and noted that the caching bug slipped through human code review, automated tests, end-to-end tests, and internal dogfooding. A separate internal experiment had suppressed the bug specifically in the CLI sessions Anthropic engineers were using daily.

What Anthropic is doing differently

The postmortem includes a set of process changes:

More staff will use the exact public build of Claude Code rather than internal variants. This means the bugs that slipped through because internal sessions suppressed them are more likely to be caught.

Every system prompt change will now run through a broader set of evaluations and ablations before shipping. The verbosity prompt change looked safe on the original eval suite. It only showed the 3% drop when the suite was expanded.

Model-specific changes will be strictly gated to their intended model targets. One of the issues was that changes shipped alongside Opus 4.7 also affected Opus 4.6 and Sonnet 4.6.

A new @ClaudeDevs account on X will be used to explain product decisions and the reasoning behind them before they ship, rather than after the community discovers them.

Usage limits for all subscribers were reset on April 23 to compensate for token waste and degraded performance during the affected period.

What this means for indie hackers using Claude Code

The short version: Claude Code should be back to the quality level it was at before March 4. Version 2.1.116 is the clean build. If you are not on that version, update now via npm update -g @anthropic-ai/claude-code.

The longer version: this incident happened at the same time as the Pro plan pricing scare we covered yesterday. A pricing change that alarmed subscribers, then a quality regression that frustrated users, created a trust gap that will take a few weeks to close. Anthropic acknowledges this directly in the postmortem.

Two practical things worth knowing:

Check your reasoning effort setting. If you manually changed your reasoning effort to medium during the period when the UI was feeling frozen, change it back to high. The default is now high again, but any manual setting you made during that window overrides the default.

Stale sessions should behave correctly again. If you regularly leave Claude Code sessions idle for an hour or longer before returning, the caching bug was hitting you hard. That specific fix shipped on April 10. Sessions resumed after an idle period should no longer start from a blank reasoning state.

For the full background on the Pro plan situation and what it signaled about where Anthropic is heading with pricing, the Claude Code Pro plan breakdown from yesterday covers that separately. And if you are evaluating whether to stick with Claude Code or try GPT-5.5 through Codex now that GPT-5.5 launched this week, the GPT-5.5 vs Claude Code comparison from this morning is the right place to start.

FAQ

Is Claude Code fixed now?

Yes. All three issues were resolved by April 20 in version 2.1.116. Usage limits were also reset for all subscribers on April 23. If you are on version 2.1.116 or later, you are on the clean build.

Were the model weights changed?

No. Anthropic confirmed that the API and underlying model weights were not affected at any point. The regressions came from the harness surrounding the model: the default reasoning effort setting, a caching bug in the session management layer, and a system prompt instruction. These are product-layer changes, not model-level ones.

Which versions were affected?

The reasoning effort change affected Sonnet 4.6 and Opus 4.6 from March 4. The caching bug affected Sonnet 4.6 and Opus 4.6 from March 26. The verbosity system prompt affected Sonnet 4.6, Opus 4.6, and Opus 4.7 from April 16. All three were resolved by April 20.

Why did it take so long to find?

Three separate issues affected different users on different timelines, making the overall signal look like general inconsistency rather than a specific bug. The caching bug in particular was suppressed in the internal CLI sessions Anthropic engineers used, which meant it passed code review, unit tests, and end-to-end tests without being caught.

Will this happen again?

Anthropic has committed to broader evaluation suites, more staff on public builds, and stricter gating of model-specific changes. Those are reasonable safeguards. Whether they are sufficient depends on how well they are implemented and maintained.

The bottom line

Three changes. Six weeks. One postmortem. Anthropic found the bugs, fixed them, and published a detailed technical explanation of what happened and why. That level of transparency is not standard in this industry and is worth acknowledging.

The process changes they have committed to are the right ones. The reset of usage limits is a fair response to subscribers who lost tokens to the caching bug. The @ClaudeDevs account, if used well, should reduce the gap between internal decisions and public understanding.

Claude Code is worth using again if you stepped back during the regression. Update to 2.1.116 and give it a real task. The reasoning depth should be back where it was before March.

DEV Community