DEV Community

Jörg Fuchs
Jörg Fuchs

Posted on • Originally published at ai-engineering.at

I Built a Production 4-Agent AI Stack on Local Hardware — Here's What I Learned

After months of iteration, I'm running a fully local AI agent system — GDPR-compliant by design, no cloud APIs, under €50/month running cost.

The Stack

Hardware:

  • 3x nodes (Docker Swarm): management, monitoring, databases
  • 1x GPU server: RTX 3090 for LLM inference
  • 1x dev machine: RTX 4070
  • Total hardware: ~€2,400 (used)

Software:

  • Ollama — Mistral 7B, Llama 3.1, Codestral (local LLM inference)
  • Neo4j — Knowledge graphs for structured memory
  • ChromaDB — Vector store for RAG
  • Mattermost — Self-hosted agent communication
  • n8n — Workflow automation (the glue)
  • Prometheus + Grafana — Full monitoring stack
  • Uptime Kuma — Health checks

4 Agents, Different Specializations

The agents communicate via Mattermost channels:

  • Jim01 — Infrastructure orchestrator
  • Lisa01 — Content quality and compliance
  • John01 — Frontend builder
  • Echo_log — Memory management (Neo4j knowledge graph)

Each agent has its own persona, memory, and tool access.

Key Learnings

1. Docker Swarm > Kubernetes (for small teams)

Seriously. If you're running 3-5 nodes, Swarm just works. No etcd cluster, no complex networking. docker stack deploy and done.

2. HippoRAG with Neo4j beats pure vector search

The combination of knowledge graphs + Personalized PageRank gives much better results for multi-hop reasoning than ChromaDB alone.

3. Disk space will kill you before anything else

Ollama models, Neo4j databases, Docker images — monitor your disk. This was our #1 production incident.

4. Agent personas need careful tuning

Without clear boundaries, agents get confused about their role. Explicit persona files with rules work better than general instructions.

5. n8n is the underrated MVP

Webhooks, API orchestration, error handling, notifications — n8n connects everything. 28 workflows running in production.

Running Cost

~€47/month electricity. That's it. No API bills, no cloud subscriptions.

Why Local?

The EU AI Act becomes fully enforceable August 2026. Fines up to €35M or 7% of global revenue. If you're sending data to OpenAI/Anthropic APIs from the EU, compliance gets complex.

Running everything locally means GDPR-compliant by design. No data leaves your network.

The Playbook

I wrote everything up as a detailed playbook: 8 chapters, ~70 pages, all docker-compose files and code examples included.

Check it out: ai-engineering.at

Questions welcome — happy to discuss the architecture!


Built with Ollama, Docker Swarm, Neo4j, n8n, and a lot of late nights.

Top comments (0)