Sam Hartley

Posted on Jun 19

I Gave Each of My AI Agents a Personality — Here's Why My Workflow Actually Improved

#agents #ai #architecture #automation

I used to have one AI assistant. It did everything — coded, wrote docs, answered questions, monitored my inbox.

It was fine. But "fine" isn't the same as "good." One model trying to be a generalist meant it was mediocre at everything. Context bloat. Conflicting instructions. The coding advice was too cautious. The writing was too robotic. The inbox monitoring missed nuance because the model was busy trying to remember my entire codebase.

So I split it into three. Each with a different personality, different model, different job. And it actually works better.

The Problem with One Agent to Rule Them All

When you have one AI doing everything, you run into three problems fast:

1. Context pollution. The coding instructions leak into the writing tone. The writing style bleeds into the code suggestions.

2. Wrong tool for the job. A 70B parameter model is overkill for "check my calendar." A 7B model is underpowered for "refactor this 500-line function."

3. No specialization. My coding agent doesn't need to know my grocery list. My writing agent doesn't need to know my API keys. But when there's only one context window, everything is in there.

The Three-Agent Setup

I now run three distinct agents, each with its own model, personality, and scope:

Agent	Model	Personality	Job
Celebi	Qwen 3.5 9B (Mac Mini)	Generalist, casual, resourceful	Orchestration, daily checks, notifications, routing
ProgrammierMinna	Qwen 3 Coder 30B (RTX 3060)	Precise, technical, no fluff	Code generation, debugging, refactoring, PR review
DocMinna	Granite 3.2 8B (Mac Mini)	Formal, structured, thorough	Documentation, technical writing, READMEs, specs

Why Different Models?

Celebi runs on the Mac Mini (M4) because it's always on, low power, and handles simple tasks instantly. Qwen 3.5 9B is perfect for "check my email, summarize it, tell me if it's urgent."

ProgrammierMinna runs on the RTX 3060 because coding tasks need a bigger model. Qwen 3 Coder 30B actually understands large codebases, suggests proper refactors, and catches edge cases the 9B misses. Response time is 10-15 seconds — fine for code, too slow for "what's the weather."

DocMinna also runs on the Mac Mini with Granite 3.2 8B. It's smaller because documentation doesn't need frontier reasoning. It just needs to be structured, consistent, and technically accurate. The smaller model is faster and cheaper.

How They Talk to Each Other

This was the hard part. I didn't want three separate chat windows. I wanted one interface (Telegram) where I message Celebi, and Celebi delegates to the right specialist.

Here's how it works:

User (Telegram): "Refactor the auth module in project X"
  → Celebi receives message
  → Classifies: "coding task, complex"
  → Routes to ProgrammierMinna
  → ProgrammierMinna generates refactored code
  → Returns to Celebi
  → Celebi formats response and sends back to user

The user never talks directly to ProgrammierMinna or DocMinna. Celebi is the router. This means:

The user has one interface
Each specialist gets only relevant context
Results are combined and formatted consistently
If a task is simple, Celebi handles it directly (no delegation overhead)

What "Personality" Actually Means in Practice

I don't mean "quirky chatbot with a backstory." I mean three things:

1. Different System Prompts

Celebi: "You're a resourceful assistant. Be concise. Don't ask clarifying questions unless critical. Default to action."

ProgrammierMinna: "You're a senior software engineer. Write clean, maintainable code. Add error handling. Consider edge cases. Explain your reasoning briefly."

DocMinna: "You're a technical writer. Structure docs with clear headings. Include code examples. Write for an intermediate developer. Be thorough but not verbose."

These aren't decorations — they fundamentally change the output. The same request to all three produces completely different results.

2. Different Context Scopes

Celebi sees my calendar, emails, weather, and general notes. It knows I'm in Turkey, that I have a meeting at 3 PM, that it's hot outside.

ProgrammierMinna sees my Git repos, code patterns, and project structure. It knows I prefer Go over Python for CLI tools, that I use specific naming conventions, that I hate nested callbacks.

DocMinna sees my documentation templates, style guides, and existing docs. It knows I write in Markdown, that I include a "Quick Start" section, that I don't use emojis in technical docs.

Each agent's context is filtered. Celebi doesn't get the Git repos. ProgrammierMinna doesn't get my grocery list. This alone cut my token usage by ~40%.

3. Different Tone and Format

Ask all three to "explain Docker":

Celebi: "Docker packages apps into containers so they run the same everywhere. Think of it as a shipping container for software — standardized, portable, isolated. Need help with a specific setup?"
ProgrammierMinna: "Docker uses OS-level virtualization to package applications with their dependencies. Key concepts: images (read-only templates), containers (runtime instances), and Dockerfiles (build instructions). For multi-container apps, use Docker Compose. Here's a minimal example..."
DocMinna: "Docker is a platform for developing, shipping, and running applications in containers. This guide covers installation, core concepts (images, containers, volumes), and best practices for production deployments..."

Same facts. Completely different delivery. And that's the point — you pick the right voice for the situation.

The Routing Logic

Celebi decides who handles what. The rules are simple but effective:

Input Signal	Route To	Example
Contains code snippets, "refactor," "debug," "function"	ProgrammierMinna	"Fix this Go error"
Contains "document," "README," "spec," "guide"	DocMinna	"Write API docs for this endpoint"
General question, scheduling, notification	Celebi (self)	"What's on my calendar?"
Mixed task (code + docs)	Both, combined	"Build a tool and document it"

The routing is a lightweight classifier — just a few-shot prompt to Qwen 3.5 9B. It gets it right ~95% of the time. The 5% that are wrong? I correct it, and the model learns from the feedback (stored in memory files).

What Actually Improved

Code quality: ProgrammierMinna suggests better abstractions because it doesn't have to also remember my dentist appointment. Cleaner context = better reasoning.

Documentation speed: DocMinna writes docs in 30 seconds that used to take me 20 minutes. And they're consistent with my existing style.

Response time: Simple queries stay on the Mac Mini (instant). Complex ones go to the GPU (acceptable delay). No more "one size fits none" latency.

Token costs: Splitting context means each agent sees only what it needs. My monthly API bill dropped from ~$45 to ~$15 because 80% of tasks stay local.

Less context-switching for me: I say what I want in Telegram. The system figures out who should handle it. I don't think about "which model should I use for this."

The Downsides (Being Honest)

Setup complexity: Three agents means three configurations, three model endpoints, three context files to manage. It's not "install and go."

Routing mistakes: Sometimes Celebi sends a coding task to DocMinna, and I get a beautifully written document instead of working code. I fix the routing rule, and it improves.

Cross-agent memory gaps: ProgrammierMinna doesn't know that DocMinna just wrote the API spec. If I'm building a tool and documenting it simultaneously, I have to manually sync context.

Hardware footprint: Three models loaded means more RAM and VRAM usage. On my setup (Mac Mini + RTX 3060), it's manageable. On a single machine with 8GB RAM, you'd struggle.

Is This Overkill for You?

Probably, if you're just using ChatGPT for occasional questions.

But if you:

Use AI daily for multiple distinct tasks
Have a local GPU or powerful machine
Find yourself rewriting AI output because the tone is wrong
Want specialized quality without paying for frontier models constantly

...then splitting into personalities is worth trying. You don't need three agents on day one. Start with two: one for general tasks, one for your most common specialized task (usually coding or writing).

The Setup in 30 Minutes

Install Ollama on your machines: curl -fsSL https://ollama.com/install.sh | sh
Pull models: ollama pull qwen3.5:9b (general) + qwen3-coder:30b (coding)
Create system prompts — one file per agent with its personality
Build a router — a 20-line script that classifies input and sends to the right endpoint
Add a frontend — Telegram bot, CLI, or web UI

The router is the only custom code you need. Everything else is off-the-shelf.

What's Next

I'm experimenting with two additions:

Memory sharing — A shared context file that all agents can read (but not write) for cross-cutting concerns like "current project" or "my tech stack."
Agent spawning — When a task is genuinely new, Celebi spawns a temporary agent with a custom prompt, runs the task, then discards it. No permanent bloat.

The goal isn't to build AGI. It's to build a team of specialists that costs less than one generalist and produces better work.

Want This Architecture?

We build custom multi-agent systems tailored to your workflow:

🤖 Multi-agent orchestration with personality routing
📝 Specialized documentation agents
💻 Code-focused AI assistants with project context
🔔 Unified notification layer (Telegram, Slack, etc.)

→ Custom AI Agent Setup on Fiverr
→ Follow the build process on Telegram

I write about running AI locally, building weird automation, and occasionally making money from side projects. If this was useful, feel free to follow.

ai #agents #automation #architecture #productivity #ollama #localllm

DEV Community