Over the past few months, I've been juggling Ollama for local stuff, Claude for tricky reasoning, and OpenAI when I need vision or specific models. Having three or four chat tabs open at all times drove me a little crazy, so I figured — why not just build one app that talks to all of them?
That's how Helix AI Studio happened.
The itch I was trying to scratch
My daily workflow looked something like this:
- Open Ollama's web UI for quick local queries
- Switch to Claude's console for complex coding tasks
- Fire up another tab for OpenAI when I needed GPT-4
- Completely forget which conversation had the context I needed
I wanted something dead simple — one dark-themed UI, one chat window, and a dropdown to switch models on the fly. No accounts, no cloud dependency (unless I want cloud), runs on my machine.
What it actually does
It turned into more than just a multi-provider chat. Here's what ended up in there:
7 providers, one dropdown — Ollama, Claude API, OpenAI API, OpenAI-compatible servers (vLLM, llama.cpp, LM Studio), plus Claude Code, Codex CLI, and Gemini CLI. CLI tools are auto-detected; if they're not installed, they don't show up. No clutter.
RAG knowledge base — Drag and drop your documents, they get embedded locally with Ollama, stored in Qdrant, and automatically injected into chat context. No API cost for embeddings.
Mem0 shared memory — This was a fun one. Memories persist across sessions and are shared between tools. So if I tell something to Helix, Claude Code picks it up too.
MCP tool integration — Connect any MCP-compatible server. I use it with a filesystem tool so the AI can actually read my project files.
3-step pipeline — Plan with a cloud model, execute locally, verify with a different model. Mixing providers per step turned out to be surprisingly useful.
CrewAI multi-agent — Ollama-only, VRAM-managed. Preset teams for dev, research, and writing tasks.
The stack
Nothing fancy:
- Backend: FastAPI + Python 3.12
- Frontend: Jinja2 + Tailwind CSS (CDN) + Alpine.js (CDN)
- Database: SQLite for app data, Qdrant for vectors
-
No build step, no npm, no webpack. Just
uv run python run.pyand you're in.
The frontend is intentionally old-school. Jinja2 templates, Tailwind from CDN, Alpine.js for reactivity. It's lightweight and I don't have to fight a bundler to change a button color.
Running it
Fastest way to try it:
git clone https://github.com/tsunamayo7/helix-ai-studio.git
cd helix-ai-studio
uv sync
uv run python run.py
# http://localhost:8504
Or Docker Compose if you want everything (Ollama + Qdrant + Mem0) spun up together:
docker compose up -d
# http://localhost:8502
There's also a live demo on Render if you just want to poke around the UI. You'll need to bring your own API key for the cloud providers.
What I learned building this
A few things that surprised me:
- WebSocket streaming is finicky but worth it. The UX difference between streaming tokens and waiting for a full response is night and day.
- Qdrant is impressive for a local vector DB. RAG worked way better than I expected once I got the chunking right.
-
CLI tools as providers is underrated. Claude Code's
-pflag basically gives you a powerful AI through a pipe. Wrapping that in a web UI was easier than expected. - Japanese/English i18n from day one was a good call. I'm based in Japan, so bilingual support wasn't optional for me. Turns out a lot of other people appreciated it too.
What's next
I'm still actively developing this. Some things on my radar:
- Custom CrewAI team definitions
- More embedding model options
- Plugin system for community extensions
If you're tired of switching between AI chat tabs, give it a shot. It's MIT licensed, runs 100% locally if you want, and doesn't phone home.
GitHub: github.com/tsunamayo7/helix-ai-studio
Live Demo: helix-ai-studio.onrender.com
Happy to answer any questions or take feedback!
Top comments (0)