김이더

Posted on Apr 23

The 12 Hours Claude Code Disappeared from Pro

#ai #claude #claudecode #aitooling

The original issue is on GitHub #42796, and coverage is at The Register.
More posts at radarlog.kr.

On the afternoon of April 21, 2026, Anthropic quietly removed Claude Code from the Pro plan.

The "includes Claude Code" line disappeared from the pricing page. The support docs that read "Using Claude Code with your Pro or Max plan" became "Using Claude Code with your Max plan." A few hours later, everything was rolled back. But this wasn't a one-off mistake.

The adaptive thinking rollout in February. The lowered default effort level in March. The AMD director's public analysis on April 6. The Pro removal experiment on April 21. These four points look like separate incidents, but they connect into one line.

Anthropic can't handle the economics of the long-running agent era, and they're trying several things at once to handle it.

Here's how those attempts connect, and how to read this signal if you're someone introducing AI coding tools into a team.

What Exactly Happened on April 21

Ed Zitron caught it first. On Anthropic's pricing page, the Pro tier's "includes Claude Code" checkmark had turned into a red X. The support docs were updated too. The Claude Code product page still mentioned Pro, but the core billing page had clearly pivoted to "Max only."

When The Register started reporting, Anthropic's Head of Growth Amol Avasare posted an explanation on X. "We're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected."

The part that came after is more interesting.

"When we launched Max a year ago, it didn't include Claude Code, Cowork didn't exist, and agents that run for hours weren't a thing. Max was designed for heavy chat usage, that's it."

"Since then, we bundled Claude Code into Max and it took off after Opus 4. Cowork landed. Long-running async agents are now everyday workflows. The way people actually use a Claude subscription has changed fundamentally."

Put "it's a 2% test" next to this and it stops sounding like a small experiment. It sounds closer to an admission that the current plan structure can't carry current usage patterns.

And two weeks before this, another event had already been building the same case.

Two Weeks Earlier: An AMD Director Showed Up With Telemetry

On April 2, Stella Laurenzo, director of AMD's AI group, filed issue #42796 against the Claude Code repo. The title: "Claude Code is unusable for complex engineering tasks with the Feb updates."

Laurenzo didn't write a vibes post about Claude feeling dumber. She brought quantitative analysis of 6,852 Claude Code sessions from her team's past three months, 234,760 tool calls, 17,871 thinking blocks. The former Google OpenXLA lead and current AI head at a $200B+ semiconductor company doesn't file a public GitHub issue on a hunch.

Three numbers matter. Thinking depth dropped 67% on average starting late February. The reads-per-edit ratio — how many files the model reads before editing one — fell from 6.6 to 2.0, a 70% reduction. And a stop-hook script built to catch "dodging responsibility, premature stopping, and permission-seeking" behavior never fired before March 8, then fired 173 times in the 17 days after.

Laurenzo's own words sharpen the picture.

"When thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without finishing, dodge responsibility for failures, take the simplest fix rather than the correct one. These are exactly the symptoms observed."

From a game programmer's angle, this data hurts in a specific way. A reads-per-edit of 6.6 is the signature of a workflow that goes "read headers, trace dependencies, grep for usages, read tests, then modify." On a complex codebase — imagine UE5 C++ with its web of headers, cpp files, USTRUCTs, and TMaps — having that number drop to 2.0 effectively means "patch and pray."

Laurenzo's team eventually moved to another provider. The line she left behind is the one that matters.

"Six months ago, Claude was unique in its reasoning quality and execution capabilities. Now, other competitors need to be very seriously considered and evaluated."

Anthropic's Rebuttal, and a Half-Admission

Boris Cherny from the Claude Code team showed up in the issue. His response mixes pushback and concession, and both halves are worth separating.

The pushback: the redact-thinking header shipped in March is a UI-only change. The actual reasoning still happens under the hood. It doesn't affect the thinking budget or the underlying reasoning mechanism. What Laurenzo measured is the length of redacted thinking signatures, so what she's seeing could be a loss of external observability rather than a real drop in reasoning.

The concession: two substantive changes did ship. On February 9, Opus 4.6 launched alongside adaptive thinking — instead of a fixed budget, the model now decides how much to think per turn. On March 3, the default effort level dropped from High to Medium (85 out of 100). Boris framed this as "a sweet spot on the intelligence-latency curve."

When users started sharing actual session transcripts, Boris moved further. He acknowledged that adaptive thinking appears to under-allocate reasoning on specific turns. The fixes he offered: /effort high or /effort max in the session, CLAUDE_CODE_EFFORT_LEVEL=max as an environment variable, and CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 to force a fixed budget.

Here's the compressed version. The model itself didn't get dumber, but the defaults quietly got lower, and users have to turn them back up manually. It's structurally the same as a car company lowering your engine's output and telling you to press the gas harder.

This framing is what connects directly to the April 21 Pro removal.

The Line: Quiet Degradation → Explicit Removal Experiment

View the Laurenzo issue and the Pro removal separately and they each look like annoying individual mishaps. View them together and the pattern appears.

Feb 9    Opus 4.6 + adaptive thinking ship
Mar 3    Default effort: High → Medium
Early Mar Thinking content redaction fully rolled out
Apr 2    Laurenzo files issue #42796
Apr 6    Boris's official response (UI-only claim + admits default drop)
Apr 21   Claude Code removed from Pro (~12 hours before rollback)

What this timeline shows is clear. Anthropic is trying to push down costs through two different mechanisms. One is quiet degradation — making the same price do less thinking. The other is explicit removal — pushing the same feature up into a more expensive tier.

Quiet degradation works until it gets caught. When it does, you can explain it as "that's a UI change, just flip your effort setting." But when someone like Laurenzo shows up with telemetry, that line stops working.

Explicit removal is the stronger card. A structural change cuts off future usage at the source. The downside is that it's visible the moment it lands. The instant a red X shows up on the Pro page, X and Hacker News and Reddit light up simultaneously. That's exactly what happened on April 21, and Anthropic backed out within half a day.

Running both cards at once isn't unusual — it's pretty standard price experimentation. The question is the order. The quiet card went first and didn't fully take, so the explicit card came next. Read Avasare's line one more time: "our current plans weren't built for this." That's not a 2% test sentence. That's a structural overhaul sentence.

Agent Economics: Why the Netflix Model Breaks

To understand why this is happening, you have to go one layer deeper. An AI subscription isn't Netflix.

Traditional SaaS like Netflix has near-zero marginal cost per user. One more signup rewatching House of Cards is a small bandwidth cost, not a new content production cost. In that model, power users are assets. They drive word-of-mouth, they lower churn, they validate the bundle.

Agent services are the opposite. Every time a user runs an agent, GPU time actually burns. A Sonnet response is a few cents, Opus is more, and a long-running agent making 200 tool calls can burn through several dollars a day per user. That math runs hot.

In this structure, power users aren't assets — they're liabilities. The more they use, the more the company loses on them. A user running Claude Code 8 hours a day on a $20 Pro plan consumes far more compute than their subscription pays for. Normally that gets subsidized by lighter users' subscriptions. But once long-running agent workflows become routine, the ratio of "light users" shrinks. The subsidy breaks.

This isn't Anthropic's problem alone. Sam Altman said last year that even the $200 ChatGPT Pro plan runs at a loss because of usage. OpenAI, Cursor, Replit — they're all hitting the same wall. Cursor moved to credits. Replit moved to effort-based billing. Google Gemini introduced hard caps. The whole industry is migrating to usage-based pricing at roughly the same time.

Anthropic's options, in broad strokes, are three. Split plans (a Pro Plus tier at $40–$50 between Pro and Max). Shrink what existing plans include (the Pro removal experiment). Or lower the model's defaults (the effort drop). Right now they're running the latter two in parallel while watching community reaction to see if they can move on the first.

From a Game Programmer's Angle: How to Read This Signal

If you're pushing AI coding tool adoption inside a company, this episode changes a few operational decisions.

The first is bringing measurement back in-house. The real lesson Laurenzo left isn't the 67% number itself — it's that she had the infrastructure to produce it. 6,852 session logs were sitting under ~/.claude/projects/, and she could parse the JSONL, correlate the length of thinking block signatures with actual content length (Pearson r=0.971), and run the analysis. Without that, the whole thing would have ended at "Claude feels off lately."

If your team is on Claude Code, it's worth collecting session logs somewhere and tracking at least the read-to-edit ratio. Anthropic has not officially committed to exposing thinking token counts in API responses. When vendors don't give you the metric, you have to build it.

The second is avoiding single-provider lock-in. This isn't "I hate Claude" — it's risk hedging. A team using only Claude Code right now has its dev process exposed to whatever pricing experiment Anthropic runs next. Codex is catching up fast. Local model options — DeepSeek, Qwen Coder — have meaningfully closed the gap for coding workloads. Keep Claude as primary, but keep a backup provider your team can actually run.

The third is pinning effort and budget settings explicitly. Now that adaptive thinking is the default, anything resembling complex engineering work should have /effort high or CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 baked into the team standard. Defaults can drop again, and the drop may not come with a loud announcement.

The fourth is distrusting "unlimited" marketing. The real lesson here: contracts with explicit numeric limits are safer than ones without. "Unlimited on the Pro plan" isn't a promise, it's copy. It gets redefined the moment usage patterns shift. A Max 20x hard cap of "N Opus calls per day" is, long-term, the more defensible contract.

What We Gained and What We Lost

One gain. Anthropic now knows where the community's line is. Rolling back in 12 hours means the reaction overshot their model. The next experiment will be built with this data, and it'll be smoother — probably as a Pro Plus tier.

One loss. A layer of trust. Read Laurenzo's line one more time.

"Six months ago, Claude was unique in its reasoning quality and execution capabilities. Now, other competitors need to be very seriously considered and evaluated."

This is a competitive take and a contract take at the same time. The moment you depend on a tool for your dev process, that vendor's pricing experiments become your team's productivity risk. Whether that's an acceptable risk has to be re-calculated every time an event like this hits.

April 21's 12 hours were Anthropic testing the range of what they can move. They were also us re-measuring how much we can depend.

Both numbers are worth keeping in mind.

"Our current plans weren't built for this." — That's not the end of the pricing experiment. That's the start.

DEV Community