Jörg Fuchs

Posted on Feb 26 • Originally published at ai-engineering.at

I Built a Production 4-Agent AI Stack on Local Hardware — Here's What I Learned

#ai #selfhosted #docker #gdpr

After months of iteration, I'm running a fully local AI agent system — GDPR-compliant by design, no cloud APIs, under €50/month running cost.

The Stack

Hardware:

3x nodes (Docker Swarm): management, monitoring, databases
1x GPU server: RTX 3090 for LLM inference
1x dev machine: RTX 4070
Total hardware: ~€2,400 (used)

Software:

Ollama — Mistral 7B, Llama 3.1, Codestral (local LLM inference)
Neo4j — Knowledge graphs for structured memory
ChromaDB — Vector store for RAG
Mattermost — Self-hosted agent communication
n8n — Workflow automation (the glue)
Prometheus + Grafana — Full monitoring stack
Uptime Kuma — Health checks

4 Agents, Different Specializations

The agents communicate via Mattermost channels:

Jim01 — Infrastructure orchestrator
Lisa01 — Content quality and compliance
John01 — Frontend builder
Echo_log — Memory management (Neo4j knowledge graph)

Each agent has its own persona, memory, and tool access.

Key Learnings

1. Docker Swarm > Kubernetes (for small teams)

Seriously. If you're running 3-5 nodes, Swarm just works. No etcd cluster, no complex networking. docker stack deploy and done.

2. HippoRAG with Neo4j beats pure vector search

The combination of knowledge graphs + Personalized PageRank gives much better results for multi-hop reasoning than ChromaDB alone.

3. Disk space will kill you before anything else

Ollama models, Neo4j databases, Docker images — monitor your disk. This was our #1 production incident.

4. Agent personas need careful tuning

Without clear boundaries, agents get confused about their role. Explicit persona files with rules work better than general instructions.

5. n8n is the underrated MVP

Webhooks, API orchestration, error handling, notifications — n8n connects everything. 28 workflows running in production.

Running Cost

~€47/month electricity. That's it. No API bills, no cloud subscriptions.

Why Local?

The EU AI Act becomes fully enforceable August 2026. Fines up to €35M or 7% of global revenue. If you're sending data to OpenAI/Anthropic APIs from the EU, compliance gets complex.

Running everything locally means GDPR-compliant by design. No data leaves your network.

The Playbook

I wrote everything up as a detailed playbook: 8 chapters, ~70 pages, all docker-compose files and code examples included.

Check it out: ai-engineering.at

Questions welcome — happy to discuss the architecture!

Built with Ollama, Docker Swarm, Neo4j, n8n, and a lot of late nights.

DEV Community