I built a self-hosted multi-agent AI platform – here's what I learned

#selfhosted #opensource #docker #ai

Most AI agent platforms give you one assistant with access to everything. Your files, your APIs, your calendar – all in one process, no isolation, no guardrails. That bothered me enough to build something different.

I'm an AI consultant and I've been running my own agentic setup for a while. This week I made it public: AI-Employee, a self-hosted platform for running teams of specialized AI agents.

The core idea

Instead of one mega-agent, you run a team. Each member has a specific role – Legal Assistant, Tax Advisor, DevOps Engineer, Marketing Manager. Each runs in its own Docker container. They have their own memory, their own workspace, their own tools, and their own rules.

They can collaborate. They can hold meetings. They can ask you for approval before doing something sensitive. And they do all of this on your hardware, with your data never leaving your server.

Architecture overview

Browser / Mobile
       |
  Caddy / Traefik (TLS)
       |
  Orchestrator (FastAPI + Docker SDK + WebSocket)
  /         |           \             \
Redis    Postgres    Embedding      Agent Pool
PubSub   pgvector    (bge-m3)      (Docker)
                                       |
                              Each agent container:
                              - Claude Code CLI runtime
                              - Own workspace + memory
                              - MCP server connections
                              - Optional Telegram bot

Stack: FastAPI + Next.js 14 + PostgreSQL + pgvector + Redis + Claude Code CLI.

What makes it interesting

True container isolation

Each agent = its own Docker container. Not a shared process with a prompt that says "please don't touch the other agent's files." Actual isolation. The orchestrator manages the lifecycle via Docker SDK – agents start on demand, go idle after configurable timeout, restart on incoming work.

Approval rules in plain language

Define governance rules like:

"Ask before spending more than €50"
"Confirm before emailing external clients"
"Never delete files without approval"

Agents enforce these automatically at runtime via the request_approval MCP tool. You approve or reject via Telegram inline button. No code needed.

Proactive mode

Agents don't just wait to be asked. They wake up on a schedule, check their task queue, and execute work autonomously. Your morning briefing is ready when you wake up.

Meeting rooms

Put 3–4 agents in a room with a topic and they debate, challenge each other's reasoning, and reach a decision. Useful for architecture reviews, legal-vs-marketing tradeoffs, or anything where you want a second (or third) opinion before acting.

Self-improvement loop

After every task, agents reflect on what worked. You rate completed tasks via Telegram (⭐1–5). Bad ratings feed into a background ImprovementEngine that periodically analyzes performance, classifies agents (excellent/good/improving/declining), and sends you a notification when status changes.

Local embeddings

Uses BAAI/bge-m3 – 1024-dim, multilingual, runs entirely on your server. No OpenAI embedding API calls, no per-token costs, no data leaving your infrastructure. The shared knowledge base uses Obsidian-style [[backlinks]] and #tags and is readable/writable by all agents.

MCP-native

Ships with 5 built-in MCP servers:

mcp-memory – semantic long-term memory per agent
mcp-knowledge – shared knowledge base
mcp-notifications – Telegram messages, approval requests
mcp-orchestrator – spawn agents, create tasks, team communication
mcp-skills – reusable capability modules

Any third-party MCP server plugs in out of the box.

What I learned building this

Container lifecycle is tricky. Managing "start on demand, stop when idle" sounds simple until you're dealing with mid-task interruptions, race conditions on the task queue, and WebSocket reconnects. The solution was a proper state machine in the orchestrator with Redis pub/sub for events.

Local embeddings are worth it. The initial setup of bge-m3 adds complexity, but never having to worry about embedding API costs or rate limits is completely worth it. Multilingual support is a bonus.

Approval workflows change agent behavior. Once agents know they'll be asked before sensitive actions, the whole dynamic changes. Agents become more conservative and explicit about what they're about to do, which is exactly what you want.

Self-improvement is hard to measure. The ImprovementEngine is the most experimental part. I'm still tuning what "good" looks like per agent type.

What's next

Better credential management (vault integrations)
Stronger sandbox isolation per agent
More pre-built integrations (Paperless-ngx, *arr stack, etc.)

Try it

git clone https://github.com/greeves89/AI-Employee.git
cd AI-Employee
cp .env.community.example .env
# Add your Claude token or Anthropic API key to .env
docker compose -f docker-compose.community.yml up -d
open http://localhost:3000