DEV Community

Tsunamayo
Tsunamayo

Posted on • Edited on

Claude Code Token Crisis: Why I Built a Local Agent Instead of Switching to Codex

The Exodus

It's April 2026 and Claude Code developers are in crisis:

  • Max plan users ($100-200/mo) hitting daily limits by afternoon
  • Anthropic admitted tokens drain "way faster than expected"
  • OpenAI Codex launched at $20/mo with no limits
  • OpenClaw hit 346K stars — but has a CVSS 8.8 RCE vulnerability

Developers are leaving. But they don't have to.

The Real Problem

Claude Code burns tokens on everything:

  • Reading a file: ~2K tokens
  • Searching code: ~5K tokens
  • Each agent subprocess: ~50K tokens
  • A complex refactoring session: 500K+ tokens

Most of these are routine operations that don't need Opus 4.6's reasoning power.

The Solution: Local Delegation

helix-agents v0.9.0 is an MCP server that keeps you on Claude while cutting token usage by 60-80%.

Claude Code (Opus 4.6) — makes decisions
  ↓ delegates via MCP
helix-agents (local, $0)
  ├── gemma4:31b — research, vision, tools
  ├── Qdrant memory — persistent across sessions
  └── Computer Use — browser automation
Enter fullscreen mode Exit fullscreen mode

Opus decides what to do. Local models do the work.

gemma4: Released Yesterday, Default Today

Google DeepMind released gemma4 on April 2nd. helix-agents adopted it as the default model on Day 1 — the fastest adoption of any MCP tool:

  • AIME 89.2% — math reasoning rivaling closed models
  • LiveCodeBench 80% — strong code generation
  • 256K context — handle massive codebases
  • Vision + Function Calling — multimodal agent capabilities
  • Apache 2.0 — fully open, no restrictions
  • Runs on 20GB VRAM — accessible hardware requirements

Windows Computer Use

Claude Code's Computer Use is macOS only. helix-agents brings it to Windows via Playwright + helix-pilot integration — making it the only MCP tool offering Computer Use on Windows today.

Multi-Provider Architecture

helix-agents isn't just about gemma4. It's a unified MCP runtime supporting three providers:

Provider Use Case Examples
ollama Local LLM (free) gemma4:31b, qwen3.5:122b, deckard-uncensored
codex Repo-scale coding Codex CLI integration, sandboxed execution
openai-compatible Hosted APIs GPT, Mistral, Groq

All 11 MCP tools (think, agent_task, fork_task, computer_use, etc.) work identically across all providers. Switch with one command:

providers(action="use", provider="codex")     # Switch to Codex
providers(action="use", provider="ollama")    # Back to local
providers(action="use_auto")                   # Auto-select
Enter fullscreen mode Exit fullscreen mode

This means:

  • Routine tasks → Ollama ($0)
  • Repo-scale coding → Codex
  • High quality but not Opus → OpenAI-compatible

Claude Code + helix-agents = optimal model at optimal cost for every task.

The multi-provider runtime has been stable since v0.4.0 — zero breaking changes through v0.9.0.

Why Not Just Switch to Codex?

Claude Code + helix-agents Codex OpenClaw
Cost $100 + $0 local $20 Free
Quality Opus 4.6 decisions GPT-5.3 Varies
Security Local, no cloud OpenAI cloud CVE-2026-25253, 12% malicious skills
Token limit Effectively 5-10x more Unlimited N/A
Ecosystem Claude Code native Separate tool Separate tool
Computer Use Windows + macOS No No

The key insight: you don't need to abandon Claude's quality to solve the cost problem.

What's in v0.9.0

Built by analyzing Claude Code's actual source architecture:

  • Fork-style context — subagents inherit parent context
  • gemma4:31b default — vision + reasoning + function calling
  • 280 tests passing — production-ready
  • Computer Use — browser/desktop automation (Windows!)
  • Qdrant shared memory — persistent vector search
  • JSONL tracing — full observability
  • OOM auto-fallback — gemma4 → gemma3 → gemma3:4b

Real Savings

Task Opus tokens With helix-agents Saved
Explore 50 files 100K 2K 98%
Code review 500 lines 30K 1K 97%
Multi-step research 200K 3K 98%

Quick Start (2 minutes)

git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
ollama pull gemma4:31b
uv run python server.py
Enter fullscreen mode Exit fullscreen mode

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

For Anthropic

This isn't an anti-Claude tool. It's a retention tool:

  • Users stay on Claude instead of switching to Codex
  • Max plan subscriptions continue
  • Token pressure decreases naturally
  • Users get a better experience and stay loyal

The best response to "Claude Code is too expensive" isn't "switch to Codex." It's "make Claude Code more efficient."

GitHub: tsunamayo7/helix-agent


Built during the 2026 token crisis. Because the best code assistant shouldn't come with a timer.

Top comments (1)

Collapse
 
steriani_karamanlis_ad61a profile image
Steriani Karamanlis

the token burn problem you're describing is real and it gets worse when you factor in that frontier pricing has been flat for 9 consecutive weeks while the spread between frontier and budget models sits at 7.1x. for agentic workflows where tokens compound across every loop the model selection decision matters enormously. most teams are paying frontier prices for tasks that a well chosen budget model handles just as well

Some comments may only be visible to logged-in visitors. Sign in to view all comments.