If you're a heavy Claude Code user, you've felt the API token burn. Every log analysis, every code review, every "summarize this file" eats your quota.
What if Claude Code could delegate routine tasks to your local Ollama models — automatically?
Introducing helix-agent
helix-agent is an MCP server that extends Claude Code with your local Ollama models. It automatically selects the best model for each task from whatever you have installed.
No API keys. No cloud. No config files. Just works.
The Architecture
User -> Claude Code -> helix-agent -> Local LLM (draft)
|
Claude reviews & enhances
|
High-quality final answer
- Local LLM handles the heavy lifting (zero token cost)
- Claude adds its superior reasoning (minimal tokens)
- You always get Claude-quality output
Why Not Just Use Ollama Directly?
| Feature | helix-agent | PAL MCP | OllamaClaude |
|---|---|---|---|
| Context overhead | <5% | ~50% | ~2% |
| Auto model selection | Yes | Yes | Fallback only |
| Local benchmarks | Yes | No | No |
| Vision support | Yes | Model-dependent | No |
| Zero-config | Yes | No | Partial |
v0.3.0: Local Benchmark Engine
The latest release adds hardware-specific benchmarks. Run 8 automated tests on your actual GPU covering code generation, reasoning, instruction following, Japanese, and speed.
Results are cached and directly influence routing priority.
Model Override
You can lock routing to a specific model anytime.
Setup (2 Minutes)
ollama pull gemma3
git clone https://github.com/tsunamayo7/helix-agent.git
cd helix-agent && uv sync
Add to ~/.claude/settings.json and you're done.
82 tests passing. MIT license. Python 3.12+.
GitHub: tsunamayo7/helix-agent
Feedback welcome!
Top comments (0)