Earlier this week, Anthropic ran a quiet test: a small slice (~2%) of new Pro plan subscribers found that Claude Code wasn't included with their $20/month subscription. The pricing page was updated to reflect this. It made some noise on Reddit and X, Anthropic walked it back, and the page was reverted.
But the incident highlights something real: the economics of hosted AI are strained.
What Actually Happened
Anthropic's head of growth clarified on social media that the test affected about 2% of new prosumer signups. The reasoning was straightforward: usage patterns have changed dramatically. Users have moved from brief chat sessions to "nearly always-on, multi-agent workflows" that consume vastly more tokens. The current plans weren't built for this.
To be clear: this wasn't a crisis. It was a business experiment that got rolled back quickly. But it was also a signal — one that shouldn't be surprising if you've been paying attention to how compute-heavy AI tools have become.
The Trend Is Clear
Claude Code isn't unique here. OpenAI has introduced peak-hour caps. Anthropic has added tighter limits during high-traffic periods. Gemini, ChatGPT, and others have all introduced various forms of rate limiting as agentic workflows (long-running, multi-step tasks) have taken off.
This isn't malice — it's math. Running a model that can handle complex, hours-long agentic tasks requires significant GPU compute. At $20/month, there's a real gap between what heavy users consume and what the subscription covers.
Enter Local Models
This is where running AI locally becomes genuinely compelling, not just theoretically interesting.
Tools like Ollama, LM Studio, and Locally Uncensored let you run capable language models on your own hardware. No subscription. No per-token billing. No rate limits. No plan changes.
The tradeoff is real: you need decent hardware (a modern Mac with unified memory, a gaming PC with a good GPU, or a dedicated home server), and the experience differs from hosted APIs. But for developers who rely on agentic workflows — the exact users feeling the squeeze from providers — the local path is increasingly viable.
Recent open-weight models from Mistral, Qwen, and the Llama family are genuinely capable for coding tasks. They're not matching the frontier models on every benchmark, but for the majority of real-world dev work, the gap has shrunk considerably.
Is This a Sales Pitch?
Not really — and I want to be clear about that. If Anthropic's pricing works for you and you don't hit limits, there's no urgent reason to change. Their models are excellent.
But if you've been on the receiving end of a rate limit mid-flow, or if you're watching your usage climb and wondering what happens next, it's worth knowing that the local option exists and has gotten significantly easier to set up over the past year.
The local ecosystem isn't for everyone. But for developers who have built automated workflows around AI — the exact users Anthropic was quietly trying to ration — it might be worth an afternoon of experimentation.
What do you think — is the local-first approach realistic for your use case, or are you all-in on hosted APIs? I'd genuinely like to know what you're running into.
Top comments (0)