The Exodus
It's April 2026 and Claude Code developers are in crisis:
- Max plan users ($100-200/mo) hitting daily limits by afternoon
- Anthropic admitted tokens drain "way faster than expected"
- OpenAI Codex launched at $20/mo with no limits
- OpenClaw hit 346K stars — but has a CVSS 8.8 RCE vulnerability
Developers are leaving. But they don't have to.
The Real Problem
Claude Code burns tokens on everything:
- Reading a file: ~2K tokens
- Searching code: ~5K tokens
- Each agent subprocess: ~50K tokens
- A complex refactoring session: 500K+ tokens
Most of these are routine operations that don't need Opus 4.6's reasoning power.
The Solution: Local Delegation
helix-agents v0.9.0 is an MCP server that keeps you on Claude while cutting token usage by 60-80%.
Claude Code (Opus 4.6) — makes decisions
↓ delegates via MCP
helix-agents (local, $0)
├── gemma4:31b — research, vision, tools
├── Qdrant memory — persistent across sessions
└── Computer Use — browser automation
Opus decides what to do. Local models do the work.
gemma4: Released Yesterday, Default Today
Google DeepMind released gemma4 on April 2nd. helix-agents adopted it as the default model on Day 1 — the fastest adoption of any MCP tool:
- AIME 89.2% — math reasoning rivaling closed models
- LiveCodeBench 80% — strong code generation
- 256K context — handle massive codebases
- Vision + Function Calling — multimodal agent capabilities
- Apache 2.0 — fully open, no restrictions
- Runs on 20GB VRAM — accessible hardware requirements
Windows Computer Use
Claude Code's Computer Use is macOS only. helix-agents brings it to Windows via Playwright + helix-pilot integration — making it the only MCP tool offering Computer Use on Windows today.
Multi-Provider Architecture
helix-agents isn't just about gemma4. It's a unified MCP runtime supporting three providers:
| Provider | Use Case | Examples |
|---|---|---|
ollama |
Local LLM (free) | gemma4:31b, qwen3.5:122b, deckard-uncensored |
codex |
Repo-scale coding | Codex CLI integration, sandboxed execution |
openai-compatible |
Hosted APIs | GPT, Mistral, Groq |
All 11 MCP tools (think, agent_task, fork_task, computer_use, etc.) work identically across all providers. Switch with one command:
providers(action="use", provider="codex") # Switch to Codex
providers(action="use", provider="ollama") # Back to local
providers(action="use_auto") # Auto-select
This means:
- Routine tasks → Ollama ($0)
- Repo-scale coding → Codex
- High quality but not Opus → OpenAI-compatible
Claude Code + helix-agents = optimal model at optimal cost for every task.
The multi-provider runtime has been stable since v0.4.0 — zero breaking changes through v0.9.0.
Why Not Just Switch to Codex?
| Claude Code + helix-agents | Codex | OpenClaw | |
|---|---|---|---|
| Cost | $100 + $0 local | $20 | Free |
| Quality | Opus 4.6 decisions | GPT-5.3 | Varies |
| Security | Local, no cloud | OpenAI cloud | CVE-2026-25253, 12% malicious skills |
| Token limit | Effectively 5-10x more | Unlimited | N/A |
| Ecosystem | Claude Code native | Separate tool | Separate tool |
| Computer Use | Windows + macOS | No | No |
The key insight: you don't need to abandon Claude's quality to solve the cost problem.
What's in v0.9.0
Built by analyzing Claude Code's actual source architecture:
- Fork-style context — subagents inherit parent context
- gemma4:31b default — vision + reasoning + function calling
- 280 tests passing — production-ready
- Computer Use — browser/desktop automation (Windows!)
- Qdrant shared memory — persistent vector search
- JSONL tracing — full observability
- OOM auto-fallback — gemma4 → gemma3 → gemma3:4b
Real Savings
| Task | Opus tokens | With helix-agents | Saved |
|---|---|---|---|
| Explore 50 files | 100K | 2K | 98% |
| Code review 500 lines | 30K | 1K | 97% |
| Multi-step research | 200K | 3K | 98% |
Quick Start (2 minutes)
git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
ollama pull gemma4:31b
uv run python server.py
Add to ~/.claude/settings.json:
{
"mcpServers": {
"helix-agents": {
"command": "uv",
"args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
}
}
}
For Anthropic
This isn't an anti-Claude tool. It's a retention tool:
- Users stay on Claude instead of switching to Codex
- Max plan subscriptions continue
- Token pressure decreases naturally
- Users get a better experience and stay loyal
The best response to "Claude Code is too expensive" isn't "switch to Codex." It's "make Claude Code more efficient."
GitHub: tsunamayo7/helix-agent
Built during the 2026 token crisis. Because the best code assistant shouldn't come with a timer.
Top comments (1)
the token burn problem you're describing is real and it gets worse when you factor in that frontier pricing has been flat for 9 consecutive weeks while the spread between frontier and budget models sits at 7.1x. for agentic workflows where tokens compound across every loop the model selection decision matters enormously. most teams are paying frontier prices for tasks that a well chosen budget model handles just as well
Some comments may only be visible to logged-in visitors. Sign in to view all comments.