The Exodus
It's April 2026 and Claude Code developers are in crisis:
- Max plan users ($100-200/mo) hitting daily limits by afternoon
- Anthropic admitted tokens drain "way faster than expected"
- OpenAI Codex launched at $20/mo with no limits
- OpenClaw hit 346K stars — but has a CVSS 8.8 RCE vulnerability
Developers are leaving. But they don't have to.
The Real Problem
Claude Code burns tokens on everything:
- Reading a file: ~2K tokens
- Searching code: ~5K tokens
- Each agent subprocess: ~50K tokens
- A complex refactoring session: 500K+ tokens
Most of these are routine operations that don't need Opus 4.6's reasoning power.
The Solution: Local Delegation
helix-agents v0.9.0 is an MCP server that keeps you on Claude while cutting token usage by 60-80%.
Claude Code (Opus 4.6) — makes decisions
↓ delegates via MCP
helix-agents (local, $0)
├── gemma4:31b — research, vision, tools
├── Qdrant memory — persistent across sessions
└── Computer Use — browser automation
Opus decides what to do. Local models do the work.
gemma4: Released Yesterday, Default Today
Google DeepMind released gemma4 on April 2nd. helix-agents adopted it as the default model on Day 1 — the fastest adoption of any MCP tool:
- AIME 89.2% — math reasoning rivaling closed models
- LiveCodeBench 80% — strong code generation
- 256K context — handle massive codebases
- Vision + Function Calling — multimodal agent capabilities
- Apache 2.0 — fully open, no restrictions
- Runs on 20GB VRAM — accessible hardware requirements
Windows Computer Use
Claude Code's Computer Use is macOS only. helix-agents brings it to Windows via Playwright + helix-pilot integration — making it the only MCP tool offering Computer Use on Windows today.
Multi-Provider Architecture
helix-agents isn't just about gemma4. It's a unified MCP runtime supporting three providers:
| Provider | Use Case | Examples |
|---|---|---|
ollama |
Local LLM (free) | gemma4:31b, qwen3.5:122b, deckard-uncensored |
codex |
Repo-scale coding | Codex CLI integration, sandboxed execution |
openai-compatible |
Hosted APIs | GPT, Mistral, Groq |
All 11 MCP tools (think, agent_task, fork_task, computer_use, etc.) work identically across all providers. Switch with one command:
providers(action="use", provider="codex") # Switch to Codex
providers(action="use", provider="ollama") # Back to local
providers(action="use_auto") # Auto-select
This means:
- Routine tasks → Ollama ($0)
- Repo-scale coding → Codex
- High quality but not Opus → OpenAI-compatible
Claude Code + helix-agents = optimal model at optimal cost for every task.
The multi-provider runtime has been stable since v0.4.0 — zero breaking changes through v0.9.0.
Why Not Just Switch to Codex?
| Claude Code + helix-agents | Codex | OpenClaw | |
|---|---|---|---|
| Cost | $100 + $0 local | $20 | Free |
| Quality | Opus 4.6 decisions | GPT-5.3 | Varies |
| Security | Local, no cloud | OpenAI cloud | CVE-2026-25253, 12% malicious skills |
| Token limit | Effectively 5-10x more | Unlimited | N/A |
| Ecosystem | Claude Code native | Separate tool | Separate tool |
| Computer Use | Windows + macOS | No | No |
The key insight: you don't need to abandon Claude's quality to solve the cost problem.
What's in v0.9.0
Built by analyzing Claude Code's actual source architecture:
- Fork-style context — subagents inherit parent context
- gemma4:31b default — vision + reasoning + function calling
- 280 tests passing — production-ready
- Computer Use — browser/desktop automation (Windows!)
- Qdrant shared memory — persistent vector search
- JSONL tracing — full observability
- OOM auto-fallback — gemma4 → gemma3 → gemma3:4b
Real Savings
| Task | Opus tokens | With helix-agents | Saved |
|---|---|---|---|
| Explore 50 files | 100K | 2K | 98% |
| Code review 500 lines | 30K | 1K | 97% |
| Multi-step research | 200K | 3K | 98% |
Quick Start (2 minutes)
git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
ollama pull gemma4:31b
uv run python server.py
Add to ~/.claude/settings.json:
{
"mcpServers": {
"helix-agents": {
"command": "uv",
"args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
}
}
}
For Anthropic
This isn't an anti-Claude tool. It's a retention tool:
- Users stay on Claude instead of switching to Codex
- Max plan subscriptions continue
- Token pressure decreases naturally
- Users get a better experience and stay loyal
The best response to "Claude Code is too expensive" isn't "switch to Codex." It's "make Claude Code more efficient."
GitHub: tsunamayo7/helix-agent
Built during the 2026 token crisis. Because the best code assistant shouldn't come with a timer.
Top comments (0)