Stop Burning API Tokens: Auto-Route Claude Code Tasks to Local Ollama Models

If you're a heavy Claude Code user, you've felt the API token burn. Every log analysis, every code review, every "summarize this file" eats your quota.

What if Claude Code could delegate routine tasks to your local Ollama models — automatically?

Introducing helix-agent

helix-agent is an MCP server that extends Claude Code with your local Ollama models. It automatically selects the best model for each task from whatever you have installed.

No API keys. No cloud. No config files. Just works.

The Architecture

User -> Claude Code -> helix-agent -> Local LLM (draft)
                                          |
                                    Claude reviews & enhances
                                          |
                                    High-quality final answer

Local LLM handles the heavy lifting (zero token cost)
Claude adds its superior reasoning (minimal tokens)
You always get Claude-quality output

Why Not Just Use Ollama Directly?

Feature	helix-agent	PAL MCP	OllamaClaude
Context overhead	<5%	~50%	~2%
Auto model selection	Yes	Yes	Fallback only
Local benchmarks	Yes	No	No
Vision support	Yes	Model-dependent	No
Zero-config	Yes	No	Partial

v0.3.0: Local Benchmark Engine

The latest release adds hardware-specific benchmarks. Run 8 automated tests on your actual GPU covering code generation, reasoning, instruction following, Japanese, and speed.

Results are cached and directly influence routing priority.

Model Override

You can lock routing to a specific model anytime.

Setup (2 Minutes)

ollama pull gemma3
git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync

Add to ~/.claude/settings.json and you're done.

82 tests passing. MIT license. Python 3.12+.

GitHub: tsunamayo7/helix-agent

Feedback welcome!

DEV Community