TITLE: Why I Migrated My AI Assistant from Claude to Qwen and Gemma in One Morning
On Friday, April 3 at 7:47 PM, an email from Anthropic landed in my inbox:
«Starting April 4 at 12pm PT / 8pm BST, you'll no longer be able to use your Claude subscription limits for third-party harnesses including OpenClaw.»
I had less than 17 hours to decide what to do.
It wasn’t like my assistant would just die—Nyx (my personal AI running on OpenClaw) could still use Claude, but now it needed “extra usage”: separate pay-as-you-go charges on top of my subscription. My Anthropic Max plan ($100–$200/month) no longer covered third-party tool usage.
So I did what I’d been putting off: I built a smarter, diversified model stack.
The email that changed everything
Anthropic’s official message explained three things:
1. The change: Starting April 4, Pro and Max subscriptions no longer cover usage in external tools like OpenClaw. They still work—but now require “extra usage” (separate pay-per-token billing).
2. The exception: Subscriptions do still cover Anthropic’s own products: Claude Code and Claude Cowork.
3. The sweeteners: Anthropic offered a one-time credit equal to one month of your subscription (redeemable by April 17), plus up to 30% discounts if you pre-buy “extra usage” bundles.
According to Boris Cherny (Head of Claude Code), the technical reason is that third-party tools aren’t optimized for Claude’s internal prompt caching system, creating disproportionate compute costs. Their own tools recycle processed text and are way more efficient.
This wasn’t a surprise move—Anthropic had also started limiting sessions to every 5 hours for the top 7% of heaviest users around the same time. The writing was on the wall.
My real-world setup
Nyx is my personal AI assistant, running on my own VPS. It handles content, automations, analytics, publishing calendars, and dozens of tasks daily. Up until Friday, it used Claude Sonnet 4.6 as the default model, covered under my Max subscription.
After the change, sticking with the status quo meant paying API tokens on top of my subscription. Community estimates suggest an active agent can burn $50–$200/month in API costs alone. That’s unacceptable when you’re already paying $100+ just to subscribe.
The obvious alternative? OpenRouter.
How I migrated in one morning
Step 1: Audit what’s actually available
First, I looked at what I could really use—not what sounds good on paper:
- Anthropic (direct token) → Sonnet, Opus, Haiku — available, but now pay-as-you-go
- Google Antigravity (OAuth) → Gemini 3.1 Pro High, Gemini 3 Flash — ruled out due to recurring timeouts we’d experienced months ago
- OpenRouter (API key) → Dozens of models from multiple providers, pay-per-use
- Groq (token) → Fast models, but many outdated model IDs
Step 2: Check real rankings on LM Arena
I didn’t trust slick marketing benchmarks. I went straight to OpenLM.ai’s LM Arena, which aggregates millions of blind human votes between models.
Here’s what mattered for my use case:
| Model | Arena Score | Open Source | Cost / 1M tokens |
|---|---|---|---|
| Gemini 3.1 Pro High | ~1505 🏆 | No | OAuth free* |
| Claude Sonnet 4.6 | ~1460 🥇 | No | $3/$15 |
| Gemma 4 31B | 1450 🥇 | ✅ Apache 2.0 | $0.14 |
| Qwen3 235B 2507 | 1418 🥉 | ✅ Apache 2.0 | $0.07 |
| DeepSeek V3 0324 | 1377 🪙 | ✅ MIT | $0.20 |
* Gemini is free via OAuth—but we’d already had timeout issues in production.
Step 3: Test real latency—not paper benchmarks
Here’s where things got wild. I used OpenClaw to spin up subagents for each model and measured cold-start response times:
| Model | Real Latency |
|---|---|
| DeepSeek V3 0324 | 257ms ✅ |
| Llama 4 Maverick | 346ms ✅ |
| Qwen3 235B | 638ms ✅ |
| Mistral Small 3.1 | 460ms ✅ |
| Gemma 4 31B | 6.2 seconds ❌ |
Gemma 4 had the best open-source Arena score—but 6 seconds of cold latency makes interactive conversation impossible. It got demoted to batch tasks only (SEO analysis, bulk processing).
Step 4: The final stack
After testing and tuning, here’s the setup I landed on:
Main model: Qwen3 235B A22B 2507 via OpenRouter
- 638ms latency
- Arena score 1418 (on par with Claude Sonnet 4.5)
- $0.07 per million tokens — ~42x cheaper than Sonnet
- 262k token context
By agent specialization:
| Agent | Model | Why |
|---|---|---|
| Main (Nyx) | Qwen3 235B | Best quality/cost/latency balance |
| Content & courses | Qwen3 235B | Strong reasoning, fluent Spanish |
| SEO & batch analysis | Gemma 4 31B | Top open-source score, acceptable latency offline |
| n8n & code | DeepSeek V3 | Fastest (257ms), coding powerhouse |
| Social comments | Mistral Small 3.1 | Super fast, $0.03/M — perfect for short replies |
| API exploration | Llama 4 Maverick | 1M token context — great for huge docs |
| Session compaction | Mistral Small 3.1 | Cheap context summarization |
| On-demand high quality | Claude (Sonnet/Opus/Haiku) | Optional for peak quality — not default anymore |
What I learned
1. Relying on a single provider is operational suicide.
That email hit at 8 PM Friday. By Saturday noon, the rules changed. Less than 17 hours’ notice for a global impact. If my infrastructure wasn’t ready, Nyx would’ve just… stopped. No drama. No warning. Just silence.
2. Open-source models are now seriously competitive.
Gemma 4 31B (Apache 2.0) scores 1450 on Arena—beating proprietary models from just 6 months ago. Qwen3 235B is neck-and-neck with Claude Sonnet at less than 1/10th the cost. This isn’t last year’s game anymore.
3. A diversified stack is cheaper and more robust.
Using the right model for the job cuts costs and improves results. The most expensive model isn’t the best for everything. Mistral Small for 3-sentence social comments works just as well as Sonnet—and costs 500x less.
4. Latency matters as much as the score.
On paper, Gemma 4 (1450) vs Qwen3 (1418) looks like a close race. But 6 seconds vs 638ms? That’s the difference between a usable assistant and a doorstop in interactive mode.
5. This move from Anthropic was predictable.
When a company gives you “unlimited” access for $20–$200/month and you’re using it to run a full-time autonomous agent, the economics eventually blow up. Anthropic chose to protect margins for direct users instead of subsidizing third-party power users. From their angle? Totally reasonable.
What you should do if this affects you
If you run an AI assistant, automation agent, or any tool using Claude via subscription + third-party tools, you’ve got three options:
- Enable “extra usage” on your Anthropic account — easiest, but adds variable costs on top of your fixed subscription. Use their free credit (valid until April 17).
- Migrate to OpenRouter — access to dozens of models, pay-per-actual-use. No monthly fees. Risk: you have to pick the right model per task.
- Go direct API with Anthropic — drop the subscription, pay per token. More predictable for variable usage, but pricier at high volume.
I went with #2, backed up by #1 when I need top-tier quality.
If you want to dive into how to build this kind of stack yourself, Join my community of founders at Cágala, Aprende, Repite — we’re running exactly these kinds of experiments daily.
This post was written by Qwen3 235B — the same model that replaced Claude as Nyx’s default. The full migration happened Saturday morning. Ironically, it was the perfect excuse to build the stack I should’ve had months ago.
📝 Originally published in Spanish at cristiantala.com
Top comments (0)