DEV Community

Owen
Owen

Posted on • Originally published at ofox.ai

Why Claude Max Users Are Leaving in May 2026: A Data-Driven Look at the Throttling Backlash

Why Claude Max Users Are Leaving in May 2026: A Data-Driven Look at the Throttling Backlash

TL;DR

Between March 23 and May 6, 2026, Claude Max subscribers experienced dramatic usage consumption changes. Five-hour sessions ended in 19 minutes, two cache bugs inflated token bills 10–20×, and Claude Code v2.1.100 burned approximately 40% more tokens than v2.1.98 for identical workloads. Anthropic acknowledged issues on March 26, shipped a partial reversal May 6, and left weekly caps unchanged. This analysis examines the backlash without rendering judgment on the strategy.

The core issue: Claude Max's spring 2026 experience involved simultaneous changes across limits, model performance, tokenization, and client behavior—making it impossible for users to identify what actually changed.

What Actually Changed Between March 23 and May 6

The throttling incident began March 23 when Max 20x users hit daily limits in 19 minutes rather than documented five hours. Initial investigation revealed four concurrent changes:

  1. Intentional peak-hour throttling during 05:00–11:00 PT and 13:00–19:00 GMT, confirmed by Anthropic on March 26

  2. Two prompt-caching bugs silently inflating token bills 10–20×, tracked in claude-code issue #41930, with source-code analysis pointing to "the attestation/anti-distillation pipeline as the proximate cause"

  3. Expiration of the 2× off-peak usage promotion on March 28, which had quietly subsidized heavy nighttime usage

  4. Claude Code v2.1.100+ token inflation—source-code analysis comparing v2.1.98 versus v2.1.100 measured 978 fewer bytes sent but 20,196 more tokens billed for identical workloads, representing roughly 40% client-side regression

None of these changes appeared on status pages, blog announcements, or email communications. Anthropic's March 31 acknowledgment simply stated "limits are being consumed far faster than expected" in scattered Reddit comments and engineer social media posts. This communication vacuum, rather than the changes themselves, triggered the backlash.

Why This Hit Harder Than the November 2025 Limit Cut

The March incident proved more damaging because users couldn't determine whether they'd encountered quotas, bugs, or downgrades. November 2025's uniform, predictable limit reduction meant usage halved consistently—the math worked. March presented a different problem: identical five prompts might show 21% usage consumption, then jump to 100% on the next attempt, with no audit trail indicating which counter was inaccurate.

The compounding effect escalated geometrically rather than linearly. A 10× cache-bug inflation on a 40% client-regression baseline against peak-hour 50% session reduction created sessions ending seven times faster than documentation indicated. This transcended quota management—it became a trust problem.

Community response scaled accordingly. The r/ClaudeAI thread "20x max usage gone in 19 minutes" accumulated 330+ comments within 24 hours. The r/ClaudeCode thread "Claude Code Limits Were Silently Reduced and It's MUCH Worse" reached 360+ comments over six days. Parallel r/Anthropic discussions questioning model degradation became inseparable from throttling conversations. GitHub tracked issues across claude-code #38335 (bug reports), #41930 (canonical cross-reference), and #54714 (late-April Max 20x daily-limit tightening after supposed resolution).

What Anthropic Was Actually Optimizing For

Anthropic addressed a capacity constraint, not a billing issue—this distinction reframes the backlash significantly.

Inference capacity for frontier models has represented Anthropic's binding growth constraint since late 2025. Max 20x users concentrate technical heavy users running agents 24/7—exactly the workload converting "five hours of capacity per user per session" into tragedy-of-the-commons dynamics. Peak-hour throttling offers the obvious lever: cap 95th percentile consumption so median users retain functional products.

The May 6 announcement suggests the company determined this lever's shape was problematic. Two simultaneous changes deployed: the 5-hour limit doubled for Pro and Max accounts, and peak-hour reductions for both tiers disappeared. The concurrent SpaceX compute deal announcement represents the supply-side answer, suggesting throttling served as temporary stopgap until capacity expanded. Notably, weekly caps remained unchanged—indicating comfort with heavy users exhausting limits mid-week, just not mid-day.

The unresolved question concerns cache bugs. Capacity constraints explain throttling; they don't explain 10–20× billing inflation or 40% client-side regression in v2.1.100. These suggest release-train problems: rapid shipping atop tokenizer/attestation rewrites with insufficient per-request token-count telemetry. Anthropic's April 23 quality-regression post-mortem hints toward this without explicit connection.

Opus 4.6 Versus 4.7: The Comparison Nobody Wants to Name

This section deliberately avoids declaring winners because the community disagrees on what comparison should even measure. Well-sourced facts include:

  • Opus 4.7 ships with a new tokenizer producing approximately 35% more tokens for identical input text, making any "Max plan is unchanged" claim hollow for 4.7 users—weekly caps shrink proportionally despite identical cap numbers

  • Opus 4.6 was silently removed from Claude Desktop Code tab model picker following 4.7 release, filed as claude-code issue #49689 and extensively discussed on Hacker News

  • A Reddit post "Opus 4.7 is not an upgrade but a serious regression" reached approximately 2,300 upvotes within 48 hours, with primary complaints centered on predictability rather than raw quality—4.7 felt "more confidently wrong" requiring excessive re-prompting

  • Opus 4.6 scheduled for deprecation June 15, 2026, compressing decision windows

Contrasting reports emerge from other engineers who report 4.7 as meaningfully stronger on long-horizon agentic tasks, characterizing regression complaints as sampling bias from users whose prompts tuned toward 4.6's quirks. GitHub Changelog coverage and Anthropic's own 4.7 page lead with agentic improvements, benchmarked consistently in independent testing.

The true dynamic: version questions and throttling questions became entangled. If usage doubled during week 1 of 4.7 solely from tokenization changes, separation becomes impossible between "I exhausted Max" and "Anthropic changed the agreement." That entanglement, more than either individual fact, generated the spring 2026 downgrade sensation for Max users.

What "Leaving Max" Actually Meant in Practice

Most users didn't cancel—they routed around limits. Three dominant patterns emerged from GitHub and Reddit discussions:

  1. Pin the client. The widely-cited April workaround pinned Claude Code to v2.1.98—the immediately preceding release—via command-line tools, accepting that new features wouldn't ship until v2.1.100+ regression resolution

  2. Hybrid routing. Run Claude Code with Claude on critical paths and cheaper backends (DeepSeek V4 Pro, Kimi 2.6, Gemini 3.1 Pro) for the 60–80% of calls not requiring flagship reasoning

  3. Switch the backend. More aggressive path: replace Claude entirely per-session. Users running 100M-token tests over four weeks documented real cost comparisons

Cancellations occurred—the r/ClaudeCode "burned $6000" thread represents the most-cited heavy-user departure. However, the modal Max 20x response involved hybridization rather than exit. This distinction matters for interpreting May 6's reversal: Anthropic appears to have moved before cancellation curves actually bent.

So Is Max Worth It After May 6?

The answer depends on usage patterns. May 6's changes—doubled 5-hour limits, removed peak-hour reductions—directly benefit users whose work clusters into intense multi-hour business-hours sessions. These users likely regain working plans.

For sustained-usage patterns—extended agentic runs, batched evaluations, codebase-wide refactors—the binding constraint remains the weekly cap, which stayed unchanged. Users still hit walls, just later in the week. For this profile, API paths through unified gateways plus selective Opus calls cost less than $200/month Max and were already cheaper before throttling began.

For users running Max specifically for integrated Claude Code experience value: the May 6 reversal probably restores functionality. For users running Max because per-token economics worked: recalculate—between new tokenizers, v2.1.100 regression, and unchanged weekly caps, the math may have shifted even where pricing didn't.

Anthropic shipped four breaking changes in six weeks, communicated none through status pages, and reversed the most visible only after 2,300-upvote Reddit posts and 700+ aggregated GitHub-issue comments—that distinction matters more than whether $200/month represents correct pricing.

Sources and Further Reading


Originally published on ofox.ai/blog.

Top comments (0)