TL;DR
Stop running your AI brain on someone else's servers.
Here's the exact stack I run on my homelab — in the order that actually makes sense to deploy it.
Why self-hosted AI in 2026?
The models crossed a threshold. qwen2.5:32b running locally on a decent machine beats GPT-3.5 on most developer tasks. It's free, private, offline, and you own every token.
Self-hosting your AI stack isn't a nerd flex anymore. It's good engineering hygiene. You wouldn't run prod on someone else's laptop. Why run your reasoning on their servers?
The Stack (in order)
1. Traefik — everything behind HTTPS first
Before anything else gets internet-exposed, Traefik goes in. Automatic TLS, reverse proxy, single entrypoint.
docker run -d \
-p 80:80 -p 443:443 \
-v /var/run/docker.sock:/var/run/docker.sock \
traefik:v3.0
Don't skip this step. Everything else sits behind it.
2. Ollama — your local LLM engine
curl -fsSL https://ollama.ai/install.sh | sh
ollama run qwen2.5:32b
Swap model names freely: gemma3, mistral, phi4, llama3.2. All free. No API key.
Minimum viable hardware: 16GB RAM for 7B models, 32GB+ for 32B. Apple Silicon M-series handles this well.
3. Open WebUI — the interface
ChatGPT-style interface that connects directly to Ollama. Supports multiple models, conversation history, document upload.
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
4. n8n — the automation brain
This is where local AI stops being a toy and becomes a workflow tool. n8n connects your LLM to everything: email, webhooks, APIs, databases, smart home.
docker run -d -p 5678:5678 \
-v n8n_data:/home/node/.n8n \
n8nio/n8n
One workflow that changed my setup: email arrives → n8n sends it to Ollama → Ollama categorizes and drafts a reply → I review. Zero cloud, full privacy.
5. LiteLLM — unified proxy
Once you have multiple models, LiteLLM gives you one OpenAI-compatible endpoint. Your apps stop caring which backend they hit.
model_list:
- model_name: local-fast
litellm_params:
model: ollama/qwen2.5:7b
api_base: http://localhost:11434
- model_name: local-heavy
litellm_params:
model: ollama/qwen2.5:32b
api_base: http://localhost:11434
The point
The local LLM alone is not the value. Connecting it to your workflow is.
Anyone can run ollama run llama3.2 and ask it questions. The interesting part is when your homelab starts doing things autonomously — reading your emails, monitoring your services, briefing you every morning — with no data leaving your network.
That's the stack that gets you there.
SIGNAL covers AI tools, automation and homelab — what actually works, tested on real hardware. No hype.
Top comments (0)