I used to have one AI assistant. It did everything — coded, wrote docs, answered questions, monitored my inbox.
It was fine. But "fine" isn't the same as "good." One model trying to be a generalist meant it was mediocre at everything. Context bloat. Conflicting instructions. The coding advice was too cautious. The writing was too robotic. The inbox monitoring missed nuance because the model was busy trying to remember my entire codebase.
So I split it into three. Each with a different personality, different model, different job. And it actually works better.
The Problem with One Agent to Rule Them All
When you have one AI doing everything, you run into three problems fast:
1. Context pollution. The coding instructions leak into the writing tone. The writing style bleeds into the code suggestions.
2. Wrong tool for the job. A 70B parameter model is overkill for "check my calendar." A 7B model is underpowered for "refactor this 500-line function."
3. No specialization. My coding agent doesn't need to know my grocery list. My writing agent doesn't need to know my API keys. But when there's only one context window, everything is in there.
The Three-Agent Setup
I now run three distinct agents, each with its own model, personality, and scope:
| Agent | Model | Personality | Job |
|---|---|---|---|
| Celebi | Qwen 3.5 9B (Mac Mini) | Generalist, casual, resourceful | Orchestration, daily checks, notifications, routing |
| ProgrammierMinna | Qwen 3 Coder 30B (RTX 3060) | Precise, technical, no fluff | Code generation, debugging, refactoring, PR review |
| DocMinna | Granite 3.2 8B (Mac Mini) | Formal, structured, thorough | Documentation, technical writing, READMEs, specs |
Why Different Models?
Celebi runs on the Mac Mini (M4) because it's always on, low power, and handles simple tasks instantly. Qwen 3.5 9B is perfect for "check my email, summarize it, tell me if it's urgent."
ProgrammierMinna runs on the RTX 3060 because coding tasks need a bigger model. Qwen 3 Coder 30B actually understands large codebases, suggests proper refactors, and catches edge cases the 9B misses. Response time is 10-15 seconds — fine for code, too slow for "what's the weather."
DocMinna also runs on the Mac Mini with Granite 3.2 8B. It's smaller because documentation doesn't need frontier reasoning. It just needs to be structured, consistent, and technically accurate. The smaller model is faster and cheaper.
How They Talk to Each Other
This was the hard part. I didn't want three separate chat windows. I wanted one interface (Telegram) where I message Celebi, and Celebi delegates to the right specialist.
Here's how it works:
User (Telegram): "Refactor the auth module in project X"
→ Celebi receives message
→ Classifies: "coding task, complex"
→ Routes to ProgrammierMinna
→ ProgrammierMinna generates refactored code
→ Returns to Celebi
→ Celebi formats response and sends back to user
The user never talks directly to ProgrammierMinna or DocMinna. Celebi is the router. This means:
- The user has one interface
- Each specialist gets only relevant context
- Results are combined and formatted consistently
- If a task is simple, Celebi handles it directly (no delegation overhead)
What "Personality" Actually Means in Practice
I don't mean "quirky chatbot with a backstory." I mean three things:
1. Different System Prompts
Celebi: "You're a resourceful assistant. Be concise. Don't ask clarifying questions unless critical. Default to action."
ProgrammierMinna: "You're a senior software engineer. Write clean, maintainable code. Add error handling. Consider edge cases. Explain your reasoning briefly."
DocMinna: "You're a technical writer. Structure docs with clear headings. Include code examples. Write for an intermediate developer. Be thorough but not verbose."
These aren't decorations — they fundamentally change the output. The same request to all three produces completely different results.
2. Different Context Scopes
Celebi sees my calendar, emails, weather, and general notes. It knows I'm in Turkey, that I have a meeting at 3 PM, that it's hot outside.
ProgrammierMinna sees my Git repos, code patterns, and project structure. It knows I prefer Go over Python for CLI tools, that I use specific naming conventions, that I hate nested callbacks.
DocMinna sees my documentation templates, style guides, and existing docs. It knows I write in Markdown, that I include a "Quick Start" section, that I don't use emojis in technical docs.
Each agent's context is filtered. Celebi doesn't get the Git repos. ProgrammierMinna doesn't get my grocery list. This alone cut my token usage by ~40%.
3. Different Tone and Format
Ask all three to "explain Docker":
Celebi: "Docker packages apps into containers so they run the same everywhere. Think of it as a shipping container for software — standardized, portable, isolated. Need help with a specific setup?"
ProgrammierMinna: "Docker uses OS-level virtualization to package applications with their dependencies. Key concepts: images (read-only templates), containers (runtime instances), and Dockerfiles (build instructions). For multi-container apps, use Docker Compose. Here's a minimal example..."
DocMinna: "Docker is a platform for developing, shipping, and running applications in containers. This guide covers installation, core concepts (images, containers, volumes), and best practices for production deployments..."
Same facts. Completely different delivery. And that's the point — you pick the right voice for the situation.
The Routing Logic
Celebi decides who handles what. The rules are simple but effective:
| Input Signal | Route To | Example |
|---|---|---|
| Contains code snippets, "refactor," "debug," "function" | ProgrammierMinna | "Fix this Go error" |
| Contains "document," "README," "spec," "guide" | DocMinna | "Write API docs for this endpoint" |
| General question, scheduling, notification | Celebi (self) | "What's on my calendar?" |
| Mixed task (code + docs) | Both, combined | "Build a tool and document it" |
The routing is a lightweight classifier — just a few-shot prompt to Qwen 3.5 9B. It gets it right ~95% of the time. The 5% that are wrong? I correct it, and the model learns from the feedback (stored in memory files).
What Actually Improved
Code quality: ProgrammierMinna suggests better abstractions because it doesn't have to also remember my dentist appointment. Cleaner context = better reasoning.
Documentation speed: DocMinna writes docs in 30 seconds that used to take me 20 minutes. And they're consistent with my existing style.
Response time: Simple queries stay on the Mac Mini (instant). Complex ones go to the GPU (acceptable delay). No more "one size fits none" latency.
Token costs: Splitting context means each agent sees only what it needs. My monthly API bill dropped from ~$45 to ~$15 because 80% of tasks stay local.
Less context-switching for me: I say what I want in Telegram. The system figures out who should handle it. I don't think about "which model should I use for this."
The Downsides (Being Honest)
Setup complexity: Three agents means three configurations, three model endpoints, three context files to manage. It's not "install and go."
Routing mistakes: Sometimes Celebi sends a coding task to DocMinna, and I get a beautifully written document instead of working code. I fix the routing rule, and it improves.
Cross-agent memory gaps: ProgrammierMinna doesn't know that DocMinna just wrote the API spec. If I'm building a tool and documenting it simultaneously, I have to manually sync context.
Hardware footprint: Three models loaded means more RAM and VRAM usage. On my setup (Mac Mini + RTX 3060), it's manageable. On a single machine with 8GB RAM, you'd struggle.
Is This Overkill for You?
Probably, if you're just using ChatGPT for occasional questions.
But if you:
- Use AI daily for multiple distinct tasks
- Have a local GPU or powerful machine
- Find yourself rewriting AI output because the tone is wrong
- Want specialized quality without paying for frontier models constantly
...then splitting into personalities is worth trying. You don't need three agents on day one. Start with two: one for general tasks, one for your most common specialized task (usually coding or writing).
The Setup in 30 Minutes
-
Install Ollama on your machines:
curl -fsSL https://ollama.com/install.sh | sh -
Pull models:
ollama pull qwen3.5:9b(general) +qwen3-coder:30b(coding) - Create system prompts — one file per agent with its personality
- Build a router — a 20-line script that classifies input and sends to the right endpoint
- Add a frontend — Telegram bot, CLI, or web UI
The router is the only custom code you need. Everything else is off-the-shelf.
What's Next
I'm experimenting with two additions:
Memory sharing — A shared context file that all agents can read (but not write) for cross-cutting concerns like "current project" or "my tech stack."
Agent spawning — When a task is genuinely new, Celebi spawns a temporary agent with a custom prompt, runs the task, then discards it. No permanent bloat.
The goal isn't to build AGI. It's to build a team of specialists that costs less than one generalist and produces better work.
Want This Architecture?
We build custom multi-agent systems tailored to your workflow:
- 🤖 Multi-agent orchestration with personality routing
- 📝 Specialized documentation agents
- 💻 Code-focused AI assistants with project context
- 🔔 Unified notification layer (Telegram, Slack, etc.)
→ Custom AI Agent Setup on Fiverr
→ Follow the build process on Telegram
I write about running AI locally, building weird automation, and occasionally making money from side projects. If this was useful, feel free to follow.
Top comments (0)