DEV Community

Tsunamayo
Tsunamayo

Posted on

Claude Code Token Crisis: Why I Built a Local Agent Instead of Switching to Codex

The Exodus

It's April 2026 and Claude Code developers are in crisis:

  • Max plan users ($100-200/mo) hitting daily limits by afternoon
  • Anthropic admitted tokens drain "way faster than expected"
  • OpenAI Codex launched at $20/mo with no limits
  • OpenClaw hit 346K stars — but has a CVSS 8.8 RCE vulnerability

Developers are leaving. But they don't have to.

The Real Problem

Claude Code burns tokens on everything:

  • Reading a file: ~2K tokens
  • Searching code: ~5K tokens
  • Each agent subprocess: ~50K tokens
  • A complex refactoring session: 500K+ tokens

Most of these are routine operations that don't need Opus 4.6's reasoning power.

The Solution: Local Delegation

helix-agents v0.9.0 is an MCP server that keeps you on Claude while cutting token usage by 60-80%.

Claude Code (Opus 4.6) — makes decisions
  ↓ delegates via MCP
helix-agents (local, $0)
  ├── gemma4:31b — research, vision, tools
  ├── Qdrant memory — persistent across sessions
  └── Computer Use — browser automation
Enter fullscreen mode Exit fullscreen mode

Opus decides what to do. Local models do the work.

gemma4: Released Yesterday, Default Today

Google DeepMind released gemma4 on April 2nd. helix-agents adopted it as the default model on Day 1 — the fastest adoption of any MCP tool:

  • AIME 89.2% — math reasoning rivaling closed models
  • LiveCodeBench 80% — strong code generation
  • 256K context — handle massive codebases
  • Vision + Function Calling — multimodal agent capabilities
  • Apache 2.0 — fully open, no restrictions
  • Runs on 20GB VRAM — accessible hardware requirements

Windows Computer Use

Claude Code's Computer Use is macOS only. helix-agents brings it to Windows via Playwright + helix-pilot integration — making it the only MCP tool offering Computer Use on Windows today.

Multi-Provider Architecture

helix-agents isn't just about gemma4. It's a unified MCP runtime supporting three providers:

Provider Use Case Examples
ollama Local LLM (free) gemma4:31b, qwen3.5:122b, deckard-uncensored
codex Repo-scale coding Codex CLI integration, sandboxed execution
openai-compatible Hosted APIs GPT, Mistral, Groq

All 11 MCP tools (think, agent_task, fork_task, computer_use, etc.) work identically across all providers. Switch with one command:

providers(action="use", provider="codex")     # Switch to Codex
providers(action="use", provider="ollama")    # Back to local
providers(action="use_auto")                   # Auto-select
Enter fullscreen mode Exit fullscreen mode

This means:

  • Routine tasks → Ollama ($0)
  • Repo-scale coding → Codex
  • High quality but not Opus → OpenAI-compatible

Claude Code + helix-agents = optimal model at optimal cost for every task.

The multi-provider runtime has been stable since v0.4.0 — zero breaking changes through v0.9.0.

Why Not Just Switch to Codex?

Claude Code + helix-agents Codex OpenClaw
Cost $100 + $0 local $20 Free
Quality Opus 4.6 decisions GPT-5.3 Varies
Security Local, no cloud OpenAI cloud CVE-2026-25253, 12% malicious skills
Token limit Effectively 5-10x more Unlimited N/A
Ecosystem Claude Code native Separate tool Separate tool
Computer Use Windows + macOS No No

The key insight: you don't need to abandon Claude's quality to solve the cost problem.

What's in v0.9.0

Built by analyzing Claude Code's actual source architecture:

  • Fork-style context — subagents inherit parent context
  • gemma4:31b default — vision + reasoning + function calling
  • 280 tests passing — production-ready
  • Computer Use — browser/desktop automation (Windows!)
  • Qdrant shared memory — persistent vector search
  • JSONL tracing — full observability
  • OOM auto-fallback — gemma4 → gemma3 → gemma3:4b

Real Savings

Task Opus tokens With helix-agents Saved
Explore 50 files 100K 2K 98%
Code review 500 lines 30K 1K 97%
Multi-step research 200K 3K 98%

Quick Start (2 minutes)

git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
ollama pull gemma4:31b
uv run python server.py
Enter fullscreen mode Exit fullscreen mode

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

For Anthropic

This isn't an anti-Claude tool. It's a retention tool:

  • Users stay on Claude instead of switching to Codex
  • Max plan subscriptions continue
  • Token pressure decreases naturally
  • Users get a better experience and stay loyal

The best response to "Claude Code is too expensive" isn't "switch to Codex." It's "make Claude Code more efficient."

GitHub: tsunamayo7/helix-agent


Built during the 2026 token crisis. Because the best code assistant shouldn't come with a timer.

Top comments (0)