DEV Community

SIGNAL
SIGNAL

Posted on

Homelab AI stack 2026 — what to run and in what order

TL;DR

Stop running your AI brain on someone else's servers.

Here's the exact stack I run on my homelab — in the order that actually makes sense to deploy it.


Why self-hosted AI in 2026?

The models crossed a threshold. qwen2.5:32b running locally on a decent machine beats GPT-3.5 on most developer tasks. It's free, private, offline, and you own every token.

Self-hosting your AI stack isn't a nerd flex anymore. It's good engineering hygiene. You wouldn't run prod on someone else's laptop. Why run your reasoning on their servers?


The Stack (in order)

1. Traefik — everything behind HTTPS first

Before anything else gets internet-exposed, Traefik goes in. Automatic TLS, reverse proxy, single entrypoint.

docker run -d \
  -p 80:80 -p 443:443 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  traefik:v3.0
Enter fullscreen mode Exit fullscreen mode

Don't skip this step. Everything else sits behind it.

2. Ollama — your local LLM engine

curl -fsSL https://ollama.ai/install.sh | sh
ollama run qwen2.5:32b
Enter fullscreen mode Exit fullscreen mode

Swap model names freely: gemma3, mistral, phi4, llama3.2. All free. No API key.

Minimum viable hardware: 16GB RAM for 7B models, 32GB+ for 32B. Apple Silicon M-series handles this well.

3. Open WebUI — the interface

ChatGPT-style interface that connects directly to Ollama. Supports multiple models, conversation history, document upload.

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main
Enter fullscreen mode Exit fullscreen mode

4. n8n — the automation brain

This is where local AI stops being a toy and becomes a workflow tool. n8n connects your LLM to everything: email, webhooks, APIs, databases, smart home.

docker run -d -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  n8nio/n8n
Enter fullscreen mode Exit fullscreen mode

One workflow that changed my setup: email arrives → n8n sends it to Ollama → Ollama categorizes and drafts a reply → I review. Zero cloud, full privacy.

5. LiteLLM — unified proxy

Once you have multiple models, LiteLLM gives you one OpenAI-compatible endpoint. Your apps stop caring which backend they hit.

model_list:
  - model_name: local-fast
    litellm_params:
      model: ollama/qwen2.5:7b
      api_base: http://localhost:11434
  - model_name: local-heavy
    litellm_params:
      model: ollama/qwen2.5:32b
      api_base: http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

The point

The local LLM alone is not the value. Connecting it to your workflow is.

Anyone can run ollama run llama3.2 and ask it questions. The interesting part is when your homelab starts doing things autonomously — reading your emails, monitoring your services, briefing you every morning — with no data leaving your network.

That's the stack that gets you there.


SIGNAL covers AI tools, automation and homelab — what actually works, tested on real hardware. No hype.

Top comments (0)