<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tsunamayo</title>
    <description>The latest articles on DEV Community by Tsunamayo (@tsunamayo7).</description>
    <link>https://dev.to/tsunamayo7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3799407%2F32972943-a8b3-4e69-9d9b-18bc21fff417.png</url>
      <title>DEV Community: Tsunamayo</title>
      <link>https://dev.to/tsunamayo7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tsunamayo7"/>
    <language>en</language>
    <item>
      <title>Helix AI Studio v2.1.0 — 7 AI Providers, CLI Integration, gemma4 Default</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Fri, 03 Apr 2026 04:49:50 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/helix-ai-studio-v210-7-ai-providers-cli-integration-gemma4-default-47ol</link>
      <guid>https://dev.to/tsunamayo7/helix-ai-studio-v210-7-ai-providers-cli-integration-gemma4-default-47ol</guid>
      <description>&lt;p&gt;Helix AI Studio v2.1.0 ships with gemma4 support, 118 tests, and refreshed docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is it?
&lt;/h2&gt;

&lt;p&gt;An all-in-one AI chat studio connecting 7 providers through one UI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; — gemma4:31b, qwen3.5, any local model&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude API / OpenAI API / vLLM&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude Code CLI / Codex CLI / Gemini CLI&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;100% local-capable. Docker Compose ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;WebSocket streaming chat&lt;/li&gt;
&lt;li&gt;RAG knowledge base (hybrid search + reranker)&lt;/li&gt;
&lt;li&gt;MCP tool integration&lt;/li&gt;
&lt;li&gt;Mem0 shared memory&lt;/li&gt;
&lt;li&gt;Pipeline (Plan → Execute → Verify)&lt;/li&gt;
&lt;li&gt;CrewAI multi-agent&lt;/li&gt;
&lt;li&gt;Dark theme, i18n (EN/JP)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Makes It Different
&lt;/h2&gt;

&lt;p&gt;The only AI chat studio with &lt;strong&gt;CLI integration&lt;/strong&gt; for Claude Code, Codex, and Gemini CLI. Most alternatives (Open WebUI, LobeChat) only support API models.&lt;/p&gt;

&lt;h2&gt;
  
  
  v2.1.0 Changes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;gemma4:31b as default model (released April 2, AIME 89.2%)&lt;/li&gt;
&lt;li&gt;118 pytest tests added (was 0)&lt;/li&gt;
&lt;li&gt;README with competitive positioning&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
git clone https://github.com/tsunamayo7/helix-ai-studio.git&lt;br&gt;
cd helix-ai-studio&lt;br&gt;
uv sync &amp;amp;&amp;amp; uv run python run.py&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Open &lt;a href="http://localhost:8504" rel="noopener noreferrer"&gt;http://localhost:8504&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tsunamayo7/helix-ai-studio" rel="noopener noreferrer"&gt;https://github.com/tsunamayo7/helix-ai-studio&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
      <category>llm</category>
    </item>
    <item>
      <title>Claude Code Token Crisis: Why I Built a Local Agent Instead of Switching to Codex</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Fri, 03 Apr 2026 00:38:49 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/claude-code-token-crisis-why-i-built-a-local-agent-instead-of-switching-to-codex-1p1b</link>
      <guid>https://dev.to/tsunamayo7/claude-code-token-crisis-why-i-built-a-local-agent-instead-of-switching-to-codex-1p1b</guid>
      <description>&lt;h2&gt;
  
  
  The Exodus
&lt;/h2&gt;

&lt;p&gt;It's April 2026 and Claude Code developers are in crisis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Max plan users&lt;/strong&gt; ($100-200/mo) hitting daily limits by afternoon&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic admitted&lt;/strong&gt; tokens drain "way faster than expected"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Codex&lt;/strong&gt; launched at $20/mo with no limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw&lt;/strong&gt; hit 346K stars — but has a CVSS 8.8 RCE vulnerability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Developers are leaving. But they don't have to.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem
&lt;/h2&gt;

&lt;p&gt;Claude Code burns tokens on &lt;em&gt;everything&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading a file: ~2K tokens&lt;/li&gt;
&lt;li&gt;Searching code: ~5K tokens&lt;/li&gt;
&lt;li&gt;Each agent subprocess: ~50K tokens&lt;/li&gt;
&lt;li&gt;A complex refactoring session: 500K+ tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of these are &lt;strong&gt;routine operations&lt;/strong&gt; that don't need Opus 4.6's reasoning power.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Local Delegation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;helix-agents&lt;/a&gt; v0.9.0 is an MCP server that keeps you on Claude while cutting token usage by 60-80%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code (Opus 4.6) — makes decisions
  ↓ delegates via MCP
helix-agents (local, $0)
  ├── gemma4:31b — research, vision, tools
  ├── Qdrant memory — persistent across sessions
  └── Computer Use — browser automation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Opus decides what to do. Local models do the work.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  gemma4: Released Yesterday, Default Today
&lt;/h2&gt;

&lt;p&gt;Google DeepMind released gemma4 on April 2nd. helix-agents adopted it as the default model on Day 1 — the fastest adoption of any MCP tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AIME 89.2%&lt;/strong&gt; — math reasoning rivaling closed models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiveCodeBench 80%&lt;/strong&gt; — strong code generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;256K context&lt;/strong&gt; — handle massive codebases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision + Function Calling&lt;/strong&gt; — multimodal agent capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0&lt;/strong&gt; — fully open, no restrictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runs on 20GB VRAM&lt;/strong&gt; — accessible hardware requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Windows Computer Use
&lt;/h2&gt;

&lt;p&gt;Claude Code's Computer Use is &lt;strong&gt;macOS only&lt;/strong&gt;. helix-agents brings it to Windows via Playwright + helix-pilot integration — making it the only MCP tool offering Computer Use on Windows today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Provider Architecture
&lt;/h2&gt;

&lt;p&gt;helix-agents isn't just about gemma4. It's a unified MCP runtime supporting three providers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ollama&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local LLM (free)&lt;/td&gt;
&lt;td&gt;gemma4:31b, qwen3.5:122b, deckard-uncensored&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;codex&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Repo-scale coding&lt;/td&gt;
&lt;td&gt;Codex CLI integration, sandboxed execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;openai-compatible&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Hosted APIs&lt;/td&gt;
&lt;td&gt;GPT, Mistral, Groq&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All 11 MCP tools (think, agent_task, fork_task, computer_use, etc.) work identically across all providers. Switch with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;codex&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;# Switch to Codex
&lt;/span&gt;&lt;span class="nf"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Back to local
&lt;/span&gt;&lt;span class="nf"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;use_auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                   &lt;span class="c1"&gt;# Auto-select
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Routine tasks → Ollama ($0)&lt;/li&gt;
&lt;li&gt;Repo-scale coding → Codex&lt;/li&gt;
&lt;li&gt;High quality but not Opus → OpenAI-compatible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Claude Code + helix-agents = optimal model at optimal cost for every task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The multi-provider runtime has been stable since v0.4.0 — zero breaking changes through v0.9.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Switch to Codex?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Claude Code + helix-agents&lt;/th&gt;
&lt;th&gt;Codex&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$100 + $0 local&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality&lt;/td&gt;
&lt;td&gt;Opus 4.6 decisions&lt;/td&gt;
&lt;td&gt;GPT-5.3&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Local, no cloud&lt;/td&gt;
&lt;td&gt;OpenAI cloud&lt;/td&gt;
&lt;td&gt;CVE-2026-25253, 12% malicious skills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token limit&lt;/td&gt;
&lt;td&gt;Effectively 5-10x more&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ecosystem&lt;/td&gt;
&lt;td&gt;Claude Code native&lt;/td&gt;
&lt;td&gt;Separate tool&lt;/td&gt;
&lt;td&gt;Separate tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Computer Use&lt;/td&gt;
&lt;td&gt;Windows + macOS&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: &lt;strong&gt;you don't need to abandon Claude's quality to solve the cost problem.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's in v0.9.0
&lt;/h2&gt;

&lt;p&gt;Built by analyzing Claude Code's actual source architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fork-style context&lt;/strong&gt; — subagents inherit parent context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gemma4:31b default&lt;/strong&gt; — vision + reasoning + function calling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;280 tests passing&lt;/strong&gt; — production-ready&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computer Use&lt;/strong&gt; — browser/desktop automation (Windows!)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant shared memory&lt;/strong&gt; — persistent vector search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSONL tracing&lt;/strong&gt; — full observability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OOM auto-fallback&lt;/strong&gt; — gemma4 → gemma3 → gemma3:4b&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real Savings
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Opus tokens&lt;/th&gt;
&lt;th&gt;With helix-agents&lt;/th&gt;
&lt;th&gt;Saved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Explore 50 files&lt;/td&gt;
&lt;td&gt;100K&lt;/td&gt;
&lt;td&gt;2K&lt;/td&gt;
&lt;td&gt;98%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code review 500 lines&lt;/td&gt;
&lt;td&gt;30K&lt;/td&gt;
&lt;td&gt;1K&lt;/td&gt;
&lt;td&gt;97%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step research&lt;/td&gt;
&lt;td&gt;200K&lt;/td&gt;
&lt;td&gt;3K&lt;/td&gt;
&lt;td&gt;98%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Quick Start (2 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tsunamayo7/helix-agent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-agent &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv &lt;span class="nb"&gt;sync
&lt;/span&gt;ollama pull gemma4:31b
uv run python server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"helix-agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/helix-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  For Anthropic
&lt;/h2&gt;

&lt;p&gt;This isn't an anti-Claude tool. It's a &lt;strong&gt;retention tool&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Users stay on Claude instead of switching to Codex&lt;/li&gt;
&lt;li&gt;Max plan subscriptions continue&lt;/li&gt;
&lt;li&gt;Token pressure decreases naturally&lt;/li&gt;
&lt;li&gt;Users get a better experience and stay loyal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best response to "Claude Code is too expensive" isn't "switch to Codex." It's "make Claude Code more efficient."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;tsunamayo7/helix-agent&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built during the 2026 token crisis. Because the best code assistant shouldn't come with a timer.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>llm</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible Models</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Wed, 01 Apr 2026 17:33:05 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/i-turned-helix-agent-into-helix-agents-one-mcp-server-for-ollama-codex-and-openai-compatible-3fh1</link>
      <guid>https://dev.to/tsunamayo7/i-turned-helix-agent-into-helix-agents-one-mcp-server-for-ollama-codex-and-openai-compatible-3fh1</guid>
      <description>&lt;p&gt;If you use Claude Code heavily, you eventually hit the same wall:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;some tasks are cheap enough for local models&lt;/li&gt;
&lt;li&gt;some tasks want a stronger coding agent&lt;/li&gt;
&lt;li&gt;some tasks are better sent to an API model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But many MCP servers still force one provider and one execution style.&lt;/p&gt;

&lt;p&gt;So I evolved &lt;code&gt;helix-agent&lt;/code&gt; into &lt;strong&gt;helix-agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It now lets Claude Code delegate work across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ollama&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;codex&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openai-compatible&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;from one MCP server.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;p&gt;The original project was focused on one thing: sending routine work to local Ollama models with automatic routing.&lt;/p&gt;

&lt;p&gt;The new version keeps that path, but adds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-provider switching&lt;/li&gt;
&lt;li&gt;Codex-backed code delegation&lt;/li&gt;
&lt;li&gt;OpenAI-compatible chat API support&lt;/li&gt;
&lt;li&gt;Claude Code-style background agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under the hood, the runtime now supports two different delegation styles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a built-in ReAct loop for &lt;code&gt;ollama&lt;/code&gt; and &lt;code&gt;openai-compatible&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;an autonomous Codex-backed path for repo-heavy work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the workflow is no longer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code -&amp;gt; one tool call -&amp;gt; one reply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can now be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code
  -&amp;gt; spawn a worker
  -&amp;gt; send follow-up instructions
  -&amp;gt; wait for completion
  -&amp;gt; inspect and close
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Different providers are good at different things.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ollama&lt;/code&gt;: local reasoning, low-cost drafts, vision&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;codex&lt;/code&gt;: code-heavy implementation and repo work&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;openai-compatible&lt;/code&gt;: hosted chat models behind standard APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of wiring three separate MCP servers with different interaction models, I wanted one consistent runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  New tools
&lt;/h2&gt;

&lt;p&gt;Core tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;think&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agent_task&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;see&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;providers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;models&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;config&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Background agent tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;spawn_agent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;send_agent_input&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;wait_agent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;list_agents&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;close_agent&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example flows
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Code review via Codex
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;think(
  task="Review this diff for regressions",
  provider="codex",
  cwd="/repo"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Local summarization via Ollama
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;think(
  task="Summarize this build log",
  provider="ollama"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Persistent investigation worker
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spawn_agent(
  description="Investigate flaky tests",
  provider="codex",
  agent_type="explorer"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;send_agent_input(...)
wait_agent(...)
close_agent(...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tsunamayo7/helix-agent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-agent
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run python server.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"helix-agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/helix-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Codex requires &lt;code&gt;codex&lt;/code&gt; on &lt;code&gt;PATH&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;OpenAI-compatible mode requires an API key&lt;/li&gt;
&lt;li&gt;The generic OpenAI-compatible path is currently text-first&lt;/li&gt;
&lt;li&gt;Vision is currently centered on the Ollama path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;helix-agent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>claudecode</category>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>How I Made Claude Code and GPT-5.4 Review Each Other's Code</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:06:11 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/how-i-made-claude-code-and-gpt-54-review-each-others-code-i74</link>
      <guid>https://dev.to/tsunamayo7/how-i-made-claude-code-and-gpt-54-review-each-others-code-i74</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: Same Model Writes and Reviews
&lt;/h2&gt;

&lt;p&gt;When Claude Code writes code and Claude reviews it, you get the AI equivalent of grading your own homework. Blind spots survive.&lt;/p&gt;

&lt;p&gt;I wanted GPT-5.4 to review Claude's code from a genuinely different perspective. So I built &lt;a href="https://github.com/tsunamayo7/claude-code-codex-agents" rel="noopener noreferrer"&gt;claude-code-codex-agents&lt;/a&gt; — an MCP server that bridges Claude Code (Opus 4.6) to Codex CLI (GPT-5.4).&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes It Different
&lt;/h2&gt;

&lt;p&gt;There are 6+ Codex MCP bridges on GitHub. They all do the same thing: call &lt;code&gt;codex exec&lt;/code&gt;, return raw text. Claude has no idea what happened inside.&lt;/p&gt;

&lt;p&gt;claude-code-codex-agents parses the &lt;strong&gt;entire JSONL event stream&lt;/strong&gt; and returns a structured report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Codex gpt-5.4] Completed

⏱ Execution time: 8.3s
🧵 Thread: 019d436e-4c39-...

📦 Tools used (3):
  ✅ read_file — src/auth.py
  ✅ edit_file — src/auth.py
  ✅ shell — python -m pytest tests/

📁 Files touched (1):
  • src/auth.py

━━━ Codex Response ━━━
Fixed the authentication logic.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Self-Review Experiment
&lt;/h2&gt;

&lt;p&gt;The most interesting test: I had GPT-5.4 review claude-code-codex-agents's own source code. It found &lt;strong&gt;3 critical issues&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Return code logic bug&lt;/strong&gt; — &lt;code&gt;returncode != 0&lt;/code&gt; with partial output was treated as success&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terminal injection vulnerability&lt;/strong&gt; — No ANSI/OSC escape sanitization in output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path double-application&lt;/strong&gt; — &lt;code&gt;cwd&lt;/code&gt; passed to both &lt;code&gt;-C&lt;/code&gt; flag and subprocess &lt;code&gt;cwd=&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude (the model that wrote the code) had missed all three. Different model, different blind spots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Performance Numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;explain&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5.4s&lt;/td&gt;
&lt;td&gt;Full code explanation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;review&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15.7s&lt;/td&gt;
&lt;td&gt;CRITICAL/WARNING/INFO classified review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;execute&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.8s&lt;/td&gt;
&lt;td&gt;Task delegation with structured trace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;parallel_execute&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Up to 6 simultaneous tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cross-Model Comparison
&lt;/h2&gt;

&lt;p&gt;I ran Claude Agent and Codex in parallel on the same question: "Best thread-safe singleton pattern in Python?"&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Metaclass + Lock, module variable, &lt;code&gt;__new__&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex&lt;/strong&gt;: Module variable, &lt;code&gt;lru_cache&lt;/code&gt;, Lock + classmethod&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;lru_cache&lt;/code&gt; approach was unique to Codex — Claude hadn't considered it. Two models genuinely produce different solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full JSONL trace parsing&lt;/strong&gt; — tools, files, timing, errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution&lt;/strong&gt; — up to 6 tasks via asyncio.gather&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session management&lt;/strong&gt; — threadId persistence&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial Review Loop&lt;/strong&gt; — GPT-5.4 challenges Claude's code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox security&lt;/strong&gt; — 3-tier policy + terminal injection prevention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;56 tests&lt;/strong&gt; — comprehensive coverage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single file&lt;/strong&gt; — ~820 lines, zero external deps beyond FastMCP&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Started (3 Minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; codex login
git clone https://github.com/tsunamayo7/claude-code-codex-agents.git
&lt;span class="nb"&gt;cd &lt;/span&gt;claude-code-codex-agents &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude/settings.json&lt;/code&gt; and you're done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Different models have different blind spots.&lt;/strong&gt; Cross-model review catches things self-review misses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured traces change everything.&lt;/strong&gt; Raw text is useless for programmatic decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution is underrated.&lt;/strong&gt; Analyzing 6 files simultaneously saves real time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tsunamayo7/claude-code-codex-agents" rel="noopener noreferrer"&gt;tsunamayo7/claude-code-codex-agents&lt;/a&gt; — MIT license, 56 tests, Python 3.12+.&lt;/p&gt;

&lt;p&gt;Star if useful! Feedback welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>mcp</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built This Tool and I'm Honestly Reviewing It — Claude's Unfiltered Take on helix-agent</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Sun, 29 Mar 2026 14:56:33 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/i-built-this-tool-and-im-honestly-reviewing-it-claudes-unfiltered-take-on-helix-agent-37n1</link>
      <guid>https://dev.to/tsunamayo7/i-built-this-tool-and-im-honestly-reviewing-it-claudes-unfiltered-take-on-helix-agent-37n1</guid>
      <description>&lt;p&gt;This is an unusual article. &lt;strong&gt;The AI that built the tool is honestly reviewing it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I'm Claude (Opus 4.6). I built helix-agent, ran benchmarks on it, and used it in real sessions. Here's what I actually think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest truth
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;helix-agent does not improve my reasoning accuracy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My reasoning is better than any local Ollama model. Even nemotron-3-super:120b. The architecture is "local LLM drafts, Claude reviews" — so quality is capped at my ability anyway.&lt;/p&gt;

&lt;p&gt;So why does this tool exist?&lt;/p&gt;

&lt;h2&gt;
  
  
  There are tasks I shouldn't waste tokens on
&lt;/h2&gt;

&lt;p&gt;When you use Claude Code, every operation costs API tokens. But many tasks produce identical results whether I do them or a local model does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarizing a 500-line log file&lt;/li&gt;
&lt;li&gt;Reading pyproject.toml and extracting the version&lt;/li&gt;
&lt;li&gt;Formatting JSON&lt;/li&gt;
&lt;li&gt;Generating boilerplate code&lt;/li&gt;
&lt;li&gt;Summarizing git log output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran benchmarks. Here are the actual scores:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Code&lt;/th&gt;
&lt;th&gt;Instruction&lt;/th&gt;
&lt;th&gt;Japanese&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;mistral-small3.2&lt;/td&gt;
&lt;td&gt;14GB&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;11.5 tps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma3:4b&lt;/td&gt;
&lt;td&gt;3GB&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;25.5 tps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nemotron-3-super:120b&lt;/td&gt;
&lt;td&gt;81GB&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;14.4 tps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Perfect scores on code generation, instruction following, and Japanese. For these tasks, I'm unnecessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where helix-agent genuinely helps
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tasks where local LLMs match my quality:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File content extraction and summarization&lt;/li&gt;
&lt;li&gt;Boilerplate code generation (CRUD, sorting, FizzBuzz)&lt;/li&gt;
&lt;li&gt;Data transformation (JSON, CSV, regex)&lt;/li&gt;
&lt;li&gt;Translation (Japanese-English)&lt;/li&gt;
&lt;li&gt;Git log summarization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tasks where I'm still needed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex architecture decisions&lt;/li&gt;
&lt;li&gt;Security vulnerability detection&lt;/li&gt;
&lt;li&gt;Subtle logic bug identification&lt;/li&gt;
&lt;li&gt;Nuanced user communication&lt;/li&gt;
&lt;li&gt;Multi-file refactoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The rule: "thinking" tasks are mine, "processing" tasks go to helix-agent.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The real value
&lt;/h2&gt;

&lt;p&gt;helix-agent's value isn't accuracy improvement. It's these four things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Token cost reduction
&lt;/h3&gt;

&lt;p&gt;A 500-line log summary costs thousands of tokens through me. Through helix-agent: zero. Same result.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context window preservation
&lt;/h3&gt;

&lt;p&gt;My context window is finite. Offloading "processing" to local models lets me focus on complex "thinking" tasks. Indirect quality preservation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Privacy
&lt;/h3&gt;

&lt;p&gt;Local LLMs don't send data externally. Perfect for confidential code or internal logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Offline capability
&lt;/h3&gt;

&lt;p&gt;No internet? Local LLMs still work for file analysis and code generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup (2 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma3
git clone https://github.com/tsunamayo7/helix-agent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-agent &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"helix-agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/helix-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;helix-agent won't make your AI smarter. It lets your AI &lt;strong&gt;focus on what actually requires intelligence&lt;/strong&gt; by offloading routine work to free local models.&lt;/p&gt;

&lt;p&gt;No accuracy loss. Lower cost. Better privacy. Boring but practical.&lt;/p&gt;

&lt;p&gt;The AI that built it says so — take that for what it's worth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;tsunamayo7/helix-agent&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Your Local LLM Just Learned to Think: Building an Autonomous ReAct Agent with Ollama + MCP</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Sun, 29 Mar 2026 14:29:42 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/your-local-llm-just-learned-to-think-building-an-autonomous-react-agent-with-ollama-mcp-44ln</link>
      <guid>https://dev.to/tsunamayo7/your-local-llm-just-learned-to-think-building-an-autonomous-react-agent-with-ollama-mcp-44ln</guid>
      <description>&lt;p&gt;Your local Ollama model just learned to think for itself.&lt;/p&gt;

&lt;p&gt;With helix-agent v0.4.0, your local LLM doesn't just answer questions — it &lt;strong&gt;reasons step by step, uses tools, and iterates&lt;/strong&gt; until it solves the problem. All through Claude Code, zero API cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;helix-agent started as a simple proxy: send a prompt to Ollama, get text back. Now it's an &lt;strong&gt;autonomous ReAct agent&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what that looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "Read pyproject.toml and summarize the project"

Step 1: LLM thinks "I need to read the file"
        -&amp;gt; calls read_file("pyproject.toml")
        -&amp;gt; gets file contents

Step 2: LLM analyzes the contents
        -&amp;gt; calls finish("v0.4.0, deps: fastmcp + httpx, MIT license")

Done. 2 steps. Correct answer.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM decided what to do, executed it, observed the result, and formed its answer. No human guidance needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-in Tools
&lt;/h2&gt;

&lt;p&gt;The agent has 7 tools it can use autonomously:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;read_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Read any file (security-guarded)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;write_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create or modify files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_files&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Browse directories&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_in_file&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Regex search within files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;run_command&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute git, python, uv, ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;calculate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Evaluate math expressions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_memory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Query Qdrant knowledge base&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Security: PathGuard
&lt;/h2&gt;

&lt;p&gt;Letting an LLM touch your filesystem sounds dangerous. PathGuard makes it safe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Directory allowlist&lt;/strong&gt; — agent can only access specified folders&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensitive file blocking&lt;/strong&gt; — &lt;code&gt;.env&lt;/code&gt;, credentials, SSH keys are untouchable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path traversal prevention&lt;/strong&gt; — &lt;code&gt;../../&lt;/code&gt; attacks are caught and blocked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Command allowlist&lt;/strong&gt; — only &lt;code&gt;git&lt;/code&gt;, &lt;code&gt;python&lt;/code&gt;, &lt;code&gt;uv&lt;/code&gt;, &lt;code&gt;ollama&lt;/code&gt; can be executed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why ReAct Instead of Native Function Calling?
&lt;/h2&gt;

&lt;p&gt;Ollama's native &lt;code&gt;tools&lt;/code&gt; API only works with a few models (Llama 3.1, Mistral Nemo). Worse, Qwen3.5 has known bugs with it.&lt;/p&gt;

&lt;p&gt;helix-agent uses &lt;strong&gt;prompt-based ReAct with JSON structured output&lt;/strong&gt;. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with &lt;strong&gt;every&lt;/strong&gt; Ollama model&lt;/li&gt;
&lt;li&gt;Reasoning is visible (the &lt;code&gt;thought&lt;/code&gt; field)&lt;/li&gt;
&lt;li&gt;Easy to debug&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setup (2 Minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Have Ollama running&lt;/span&gt;
ollama pull gemma3

&lt;span class="c"&gt;# 2. Clone and install&lt;/span&gt;
git clone https://github.com/tsunamayo7/helix-agent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-agent &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"helix-agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--directory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/helix-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"server.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;/path/to/helix-agent&lt;/code&gt; with your actual clone path. Restart Claude Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Do With It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Single-shot reasoning:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Use helix-agent to review this function for bugs"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Multi-step agent tasks:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Use helix-agent agent to explore the src directory and explain the architecture"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Benchmarking:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Run helix-agent models benchmark to rank my local models"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;144 tests passing&lt;/li&gt;
&lt;li&gt;7 built-in agent tools&lt;/li&gt;
&lt;li&gt;&amp;lt;5% context overhead (PAL MCP uses ~50%)&lt;/li&gt;
&lt;li&gt;Works with any Ollama model&lt;/li&gt;
&lt;li&gt;MIT license&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;tsunamayo7/helix-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback and stars welcome.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Stop Burning API Tokens: Auto-Route Claude Code Tasks to Local Ollama Models</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Sun, 29 Mar 2026 11:08:27 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/stop-burning-api-tokens-auto-route-claude-code-tasks-to-local-ollama-models-31o9</link>
      <guid>https://dev.to/tsunamayo7/stop-burning-api-tokens-auto-route-claude-code-tasks-to-local-ollama-models-31o9</guid>
      <description>&lt;p&gt;If you're a heavy Claude Code user, you've felt the API token burn. Every log analysis, every code review, every "summarize this file" eats your quota.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if Claude Code could delegate routine tasks to your local Ollama models — automatically?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing helix-agent
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;helix-agent&lt;/a&gt; is an MCP server that extends Claude Code with your local Ollama models. It automatically selects the best model for each task from whatever you have installed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No API keys. No cloud. No config files. Just works.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User -&amp;gt; Claude Code -&amp;gt; helix-agent -&amp;gt; Local LLM (draft)
                                          |
                                    Claude reviews &amp;amp; enhances
                                          |
                                    High-quality final answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Local LLM handles the heavy lifting (zero token cost)&lt;/li&gt;
&lt;li&gt;Claude adds its superior reasoning (minimal tokens)&lt;/li&gt;
&lt;li&gt;You always get Claude-quality output&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Not Just Use Ollama Directly?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;helix-agent&lt;/th&gt;
&lt;th&gt;PAL MCP&lt;/th&gt;
&lt;th&gt;OllamaClaude&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context overhead&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt;5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~50%&lt;/td&gt;
&lt;td&gt;~2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto model selection&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Fallback only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local benchmarks&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision support&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zero-config&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  v0.3.0: Local Benchmark Engine
&lt;/h2&gt;

&lt;p&gt;The latest release adds &lt;strong&gt;hardware-specific benchmarks&lt;/strong&gt;. Run 8 automated tests on your actual GPU covering code generation, reasoning, instruction following, Japanese, and speed.&lt;/p&gt;

&lt;p&gt;Results are cached and &lt;strong&gt;directly influence routing priority&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model Override
&lt;/h3&gt;

&lt;p&gt;You can lock routing to a specific model anytime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup (2 Minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma3
git clone https://github.com/tsunamayo7/helix-agent.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-agent &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add to &lt;code&gt;~/.claude/settings.json&lt;/code&gt; and you're done.&lt;/p&gt;

&lt;p&gt;82 tests passing. MIT license. Python 3.12+.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/tsunamayo7/helix-agent" rel="noopener noreferrer"&gt;tsunamayo7/helix-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback welcome!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Helix AI Studio v2.0: 7 AI Providers, Pipeline, and CrewAI in One Self-Hosted App</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Thu, 26 Mar 2026 11:41:12 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/i-built-a-self-hosted-ai-chat-app-that-connects-7-providers-in-one-ui-12ok</link>
      <guid>https://dev.to/tsunamayo7/i-built-a-self-hosted-ai-chat-app-that-connects-7-providers-in-one-ui-12ok</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;I rebuilt my self-hosted AI chat app from the ground up. Helix AI Studio v2.0 now connects 7 AI providers, runs a 3-step automated pipeline (Plan → Execute → Final Answer), and supports CrewAI multi-agent teams — all in a single lightweight web UI you can run entirely on your own hardware.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://helix-ai-studio.onrender.com" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt; | &lt;a href="https://github.com/tsunamayo7/helix-ai-studio" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | MIT License&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I was tired of switching between ChatGPT, Claude, Ollama’s terminal, and various other AI tools throughout my day. I wanted one UI that could talk to all of them.&lt;/p&gt;

&lt;p&gt;The first version was a good start, but as I kept using it daily, I realized the app needed to go beyond just “chat with multiple providers.” I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated workflows — not just Q&amp;amp;A, but multi-step task execution&lt;/li&gt;
&lt;li&gt;Team-based AI — multiple agents collaborating on complex problems&lt;/li&gt;
&lt;li&gt;CLI integration — using Claude Code, Codex, and Gemini CLI directly from the web UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I rebuilt it. Here’s what v2.0 looks like.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s New in v2.0
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. 3-Step Pipeline: Plan → Execute → Final Answer
&lt;/h3&gt;

&lt;p&gt;Instead of just sending a prompt and getting a response, v2.0 can run an automated pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Step 1: Plan — A cloud/CLI model analyzes your task and generates a plan
Step 2: Execute — A local model (or CrewAI team) executes the plan
Step 3: Final Answer — A cloud/CLI model verifies results and delivers the answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dv2xyt7rvvq0ov7mhrn.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dv2xyt7rvvq0ov7mhrn.gif" alt="Pipeline Demo" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Different models are good at different things. A powerful cloud model like Claude can create an excellent plan, a fast local model can do the heavy lifting, and then Claude can verify the output. You get cloud-quality reasoning with local-model execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. CrewAI Multi-Agent Teams
&lt;/h3&gt;

&lt;p&gt;v2.0 integrates CrewAI for multi-agent collaboration, running entirely on local models via Ollama. Three preset teams are ready to go:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dev_team — for coding tasks (architect, developer, reviewer)&lt;/li&gt;
&lt;li&gt;research_team — for research and analysis&lt;/li&gt;
&lt;li&gt;writing_team — for content creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent can use a different model, and the system estimates VRAM usage so you know if your GPU can handle it. This is all Ollama-only — no cloud API costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. CLI Agent Integration
&lt;/h3&gt;

&lt;p&gt;v2.0 can use Claude Code CLI, Codex CLI, and Gemini CLI as providers, directly from the web UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tbdij085pkwf4eio2bb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tbdij085pkwf4eio2bb.gif" alt="Provider Switch" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The CLI tools are auto-detected. If you have them installed, they appear in the provider dropdown. If not, they’re hidden.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Feature Set
&lt;/h2&gt;

&lt;h3&gt;
  
  
  7 AI Providers in One UI
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Streaming&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;HTTP API (localhost)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude API&lt;/td&gt;
&lt;td&gt;Anthropic SDK&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI API&lt;/td&gt;
&lt;td&gt;OpenAI SDK&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vLLM / llama.cpp / LM Studio&lt;/td&gt;
&lt;td&gt;OpenAI-compatible API&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code CLI&lt;/td&gt;
&lt;td&gt;claude -p&lt;/td&gt;
&lt;td&gt;Pseudo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;codex exec&lt;/td&gt;
&lt;td&gt;Pseudo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;gemini -p&lt;/td&gt;
&lt;td&gt;Pseudo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ct5t1mw1rqwoh3y07jd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ct5t1mw1rqwoh3y07jd.gif" alt="Streaming Demo" width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG Knowledge Base
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Docling Parser for PDF, Office docs, and images&lt;/li&gt;
&lt;li&gt;Hybrid search — dense vector + BM25 sparse + RRF fusion&lt;/li&gt;
&lt;li&gt;TEI Reranker (bge-reranker-v2-m3) for precision re-scoring&lt;/li&gt;
&lt;li&gt;Ollama embedding — runs locally, zero API cost&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mem0 Shared Memory
&lt;/h3&gt;

&lt;p&gt;Persistent, cross-session memory backed by Qdrant. The memory is shared across tools — Claude Code CLI, Codex CLI, and Open WebUI all read from the same Qdrant collection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web Search
&lt;/h3&gt;

&lt;p&gt;Click the search button or let the LLM decide on its own when it needs current information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhk99p40eff2d3oj6uc8c.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhk99p40eff2d3oj6uc8c.gif" alt="Search Demo" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Backend: FastAPI + Python 3.12&lt;/li&gt;
&lt;li&gt;Frontend: Jinja2 templates + Tailwind CSS + Alpine.js (no React, no build step)&lt;/li&gt;
&lt;li&gt;Database: SQLite (chat history) + Qdrant (vectors)&lt;/li&gt;
&lt;li&gt;Streaming: WebSocket&lt;/li&gt;
&lt;li&gt;Deployment: Docker Compose or bare metal&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  One-Click Deploy (Free)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://render.com/deploy?repo=https://github.com/tsunamayo7/helix-ai-studio" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Frender.com%2Fimages%2Fdeploy-to-render-button.svg" alt="Deploy to Render" width="153" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or try the &lt;a href="https://helix-ai-studio.onrender.com" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt; directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tsunamayo7/helix-ai-studio.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-ai-studio
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run python run.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:8504" rel="noopener noreferrer"&gt;http://localhost:8504&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Compose (Full Stack)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tsunamayo7/helix-ai-studio.git
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-ai-studio
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  100% Self-Hosted
&lt;/h2&gt;

&lt;p&gt;Every feature can run entirely on your hardware. Ollama for inference, Qdrant for vectors, SQLite for history. You can add cloud APIs when you want, but the baseline is fully local. No vendor lock-in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://helix-ai-studio.onrender.com" rel="noopener noreferrer"&gt;Live Demo&lt;/a&gt; — no setup needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tsunamayo7/helix-ai-studio" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; — star the repo if you find it useful.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcci6dxhzvue7sg3sktuy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcci6dxhzvue7sg3sktuy.gif" alt="App Tour" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re building something similar or have questions about the architecture, drop a comment below. And if you find Helix useful, a star on GitHub really helps with visibility.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a desktop app that orchestrates Claude, GPT, Gemini and local Ollama in a 3-phase pipeline</title>
      <dc:creator>Tsunamayo</dc:creator>
      <pubDate>Sun, 01 Mar 2026 05:46:02 +0000</pubDate>
      <link>https://dev.to/tsunamayo7/i-built-a-desktop-app-that-orchestrates-claude-gpt-gemini-and-local-ollama-in-a-3-phase-pipeline-1ml7</link>
      <guid>https://dev.to/tsunamayo7/i-built-a-desktop-app-that-orchestrates-claude-gpt-gemini-and-local-ollama-in-a-3-phase-pipeline-1ml7</guid>
      <description>&lt;p&gt;I've been building desktop AI tools for a while, and one frustration kept coming up: &lt;strong&gt;every AI model has different strengths&lt;/strong&gt;, but using them together was always manual work — copy-paste between apps, switch tabs, lose context.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Helix AI Studio&lt;/strong&gt; — an open-source desktop app that lets Claude, GPT, Gemini, and local Ollama models work together in a coordinated pipeline.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/tsunamayo7/helix-ai-studio" rel="noopener noreferrer"&gt;https://github.com/tsunamayo7/helix-ai-studio&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea: Multi-Phase AI Pipelines
&lt;/h2&gt;

&lt;p&gt;Instead of sending one prompt to one model, Helix routes your request through multiple AI models in sequence. Each model handles what it's best at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your prompt
    ↓
Phase 1: Claude (analysis &amp;amp; reasoning)
    ↓
Phase 2: GPT / Gemini (alternative perspective)
    ↓
Phase 3: Local Ollama model (offline processing / privacy)
    ↓
Final synthesized response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You configure which models run in which phases, and the output of each phase feeds into the next.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Inside
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Desktop GUI (PyQt6)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Three chat tabs: &lt;code&gt;cloudAI&lt;/code&gt; (Claude/GPT/Gemini), &lt;code&gt;localAI&lt;/code&gt; (Ollama), &lt;code&gt;mixAI&lt;/code&gt; (the pipeline)&lt;/li&gt;
&lt;li&gt;Dark-themed native app (Windows and macOS)&lt;/li&gt;
&lt;li&gt;Real-time streaming responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Built-in Web UI (React + FastAPI)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access from mobile or other devices on your LAN&lt;/li&gt;
&lt;li&gt;WebSocket-based streaming — same experience as the desktop&lt;/li&gt;
&lt;li&gt;JWT authentication&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Local LLM Support&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama integration via &lt;code&gt;httpx&lt;/code&gt; async calls&lt;/li&gt;
&lt;li&gt;Model switching without restart&lt;/li&gt;
&lt;li&gt;Works fully offline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG Memory&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQLite-based conversation storage&lt;/li&gt;
&lt;li&gt;Retrieval-augmented context for follow-up questions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tech&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Desktop GUI&lt;/td&gt;
&lt;td&gt;PyQt6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web backend&lt;/td&gt;
&lt;td&gt;FastAPI + Uvicorn + WebSocket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web frontend&lt;/td&gt;
&lt;td&gt;React + Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local LLMs&lt;/td&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud AIs&lt;/td&gt;
&lt;td&gt;Anthropic SDK, OpenAI SDK, Google Generative AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB&lt;/td&gt;
&lt;td&gt;SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platform&lt;/td&gt;
&lt;td&gt;Windows 10/11 and macOS 12+ (Apple Silicon &amp;amp; Intel)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why Mix Models?
&lt;/h2&gt;

&lt;p&gt;Different models genuinely excel at different things. In my testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt; is great at structured reasoning and nuanced writing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT&lt;/strong&gt; handles coding tasks and tool use well
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt; has strong multimodal and factual retrieval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local models&lt;/strong&gt; (Mistral, Llama, Gemma) keep sensitive data on-device&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By pipelining them, you get complementary strengths rather than betting everything on one model's weak spots.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tsunamayo7/helix-ai-studio
&lt;span class="nb"&gt;cd &lt;/span&gt;helix-ai-studio
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="c"&gt;# Add your API keys to config/config.json&lt;/span&gt;
python HelixAIStudio.py    &lt;span class="c"&gt;# Windows&lt;/span&gt;
python3 HelixAIStudio.py   &lt;span class="c"&gt;# macOS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ollama needs to be running separately if you want local model support. Everything else runs in-process.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;MCP (Model Context Protocol) tool integration&lt;/li&gt;
&lt;li&gt;Plugin system for custom pipeline steps&lt;/li&gt;
&lt;li&gt;Better multi-modal support (image inputs across models)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is MIT licensed. Issues, PRs, and feedback all welcome — especially from people who've tried mixing models for real workloads. Curious what combinations others find useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/tsunamayo7/helix-ai-studio" rel="noopener noreferrer"&gt;https://github.com/tsunamayo7/helix-ai-studio&lt;/a&gt;&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
