DEV Community

Tsunamayo
Tsunamayo

Posted on

Stop Burning API Tokens: Auto-Route Claude Code Tasks to Local Ollama Models

If you're a heavy Claude Code user, you've felt the API token burn. Every log analysis, every code review, every "summarize this file" eats your quota.

What if Claude Code could delegate routine tasks to your local Ollama models — automatically?

Introducing helix-agent

helix-agent is an MCP server that extends Claude Code with your local Ollama models. It automatically selects the best model for each task from whatever you have installed.

No API keys. No cloud. No config files. Just works.

The Architecture

User -> Claude Code -> helix-agent -> Local LLM (draft)
                                          |
                                    Claude reviews & enhances
                                          |
                                    High-quality final answer
Enter fullscreen mode Exit fullscreen mode
  • Local LLM handles the heavy lifting (zero token cost)
  • Claude adds its superior reasoning (minimal tokens)
  • You always get Claude-quality output

Why Not Just Use Ollama Directly?

Feature helix-agent PAL MCP OllamaClaude
Context overhead <5% ~50% ~2%
Auto model selection Yes Yes Fallback only
Local benchmarks Yes No No
Vision support Yes Model-dependent No
Zero-config Yes No Partial

v0.3.0: Local Benchmark Engine

The latest release adds hardware-specific benchmarks. Run 8 automated tests on your actual GPU covering code generation, reasoning, instruction following, Japanese, and speed.

Results are cached and directly influence routing priority.

Model Override

You can lock routing to a specific model anytime.

Setup (2 Minutes)

ollama pull gemma3
git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
Enter fullscreen mode Exit fullscreen mode

Add to ~/.claude/settings.json and you're done.

82 tests passing. MIT license. Python 3.12+.

GitHub: tsunamayo7/helix-agent

Feedback welcome!

Top comments (0)