DEV Community

Tsunamayo
Tsunamayo

Posted on

I built a desktop app that orchestrates Claude, GPT, Gemini and local Ollama in a 3-phase pipeline

I've been building desktop AI tools for a while, and one frustration kept coming up: every AI model has different strengths, but using them together was always manual work — copy-paste between apps, switch tabs, lose context.

So I built Helix AI Studio — an open-source desktop app that lets Claude, GPT, Gemini, and local Ollama models work together in a coordinated pipeline.

GitHub: https://github.com/tsunamayo7/helix-ai-studio


The Core Idea: Multi-Phase AI Pipelines

Instead of sending one prompt to one model, Helix routes your request through multiple AI models in sequence. Each model handles what it's best at:

Your prompt
    ↓
Phase 1: Claude (analysis & reasoning)
    ↓
Phase 2: GPT / Gemini (alternative perspective)
    ↓
Phase 3: Local Ollama model (offline processing / privacy)
    ↓
Final synthesized response
Enter fullscreen mode Exit fullscreen mode

You configure which models run in which phases, and the output of each phase feeds into the next.


What's Inside

Desktop GUI (PyQt6)

  • Three chat tabs: cloudAI (Claude/GPT/Gemini), localAI (Ollama), mixAI (the pipeline)
  • Dark-themed native app (Windows and macOS)
  • Real-time streaming responses

Built-in Web UI (React + FastAPI)

  • Access from mobile or other devices on your LAN
  • WebSocket-based streaming — same experience as the desktop
  • JWT authentication

Local LLM Support

  • Ollama integration via httpx async calls
  • Model switching without restart
  • Works fully offline

RAG Memory

  • SQLite-based conversation storage
  • Retrieval-augmented context for follow-up questions

Tech Stack

Layer Tech
Desktop GUI PyQt6
Web backend FastAPI + Uvicorn + WebSocket
Web frontend React + Tailwind CSS
Local LLMs Ollama
Cloud AIs Anthropic SDK, OpenAI SDK, Google Generative AI
DB SQLite
Platform Windows 10/11 and macOS 12+ (Apple Silicon & Intel)

Why Mix Models?

Different models genuinely excel at different things. In my testing:

  • Claude is great at structured reasoning and nuanced writing
  • GPT handles coding tasks and tool use well
  • Gemini has strong multimodal and factual retrieval
  • Local models (Mistral, Llama, Gemma) keep sensitive data on-device

By pipelining them, you get complementary strengths rather than betting everything on one model's weak spots.


Getting Started

git clone https://github.com/tsunamayo7/helix-ai-studio
cd helix-ai-studio
pip install -r requirements.txt
# Add your API keys to config/config.json
python HelixAIStudio.py    # Windows
python3 HelixAIStudio.py   # macOS
Enter fullscreen mode Exit fullscreen mode

Ollama needs to be running separately if you want local model support. Everything else runs in-process.


What's Next

  • MCP (Model Context Protocol) tool integration
  • Plugin system for custom pipeline steps
  • Better multi-modal support (image inputs across models)

The project is MIT licensed. Issues, PRs, and feedback all welcome — especially from people who've tried mixing models for real workloads. Curious what combinations others find useful.

GitHub: https://github.com/tsunamayo7/helix-ai-studio

Top comments (1)

Collapse
 
matthewhou profile image
Matthew Hou

The multi-phase pipeline approach is the right direction. Single-model-does-everything is hitting a wall for anything beyond simple tasks.

One question: how do you handle disagreements between models? If Phase 1 (Claude) produces an analysis that Phase 2 (GPT) fundamentally contradicts, does the pipeline have a resolution strategy, or does the final phase just work with whatever it receives?

In my experience, the orchestration layer is where most of the engineering effort ends up — not in the model calls themselves. Routing, error handling, context compression between phases, knowing when to retry vs skip. The model calls are the easy part. Everything around them is where it gets interesting.