SIGNAL

Posted on Mar 4

The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

#ai #selfhosted #homelab #devops

The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

SIGNAL — Weekly intelligence for builders

Spend five minutes on r/selfhosted and you'll notice: the conversations have changed.

Two years ago everyone asked "what should I run?" Now they're sharing sophisticated stacks that rival small business infrastructure. The self-hosting AI movement has matured. Here's what's actually worth deploying in 2026.

The Core Stack (What Stayed)

Ollama — Local LLM Runtime

Ollama won. It beat LocalAI on simplicity, beat llama.cpp on UX, and the model library makes pulling new models trivial.

# Install
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the best value model for 16GB RAM
ollama pull qwen2.5:14b

# Or for 24GB+ (M4 Mac mini, high-RAM PC)
ollama pull qwen2.5:32b

# Test immediately
ollama run qwen2.5:14b "Explain what makes a good Docker Compose file"

Hardware reality check:

8GB RAM → 7B models, basic tasks
16GB RAM → 14B models, solid capability
24GB RAM (M4 Mac mini sweet spot) → 32B models, near GPT-4 quality
32GB+ → 70B models, excellent for everything

Open WebUI — The Interface

Deploys in 2 minutes, gives you a ChatGPT-equivalent interface locally:

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui:

n8n — Automation Brain

For connecting AI to everything else. Self-hosted, no per-workflow limits, full control.

The killer use case in 2026: n8n + Ollama = private AI automations that cost $0/month to run.

My actual running workflows:

Gmail → Ollama triage → priority flag → Telegram alert
RSS feeds → Ollama summary → daily digest at 7am
Server logs → Ollama anomaly check → alert if weird

What Got Replaced in 2026

LocalAI → Ollama

LocalAI was great for compatibility. Ollama is better for everything else.

Flowise → n8n

Flowise is excellent for RAG pipelines. But n8n handles AI and everything else — one tool beats two.

Custom Python scripts → n8n workflows

Maintainability. A n8n workflow is inspectable, editable, debuggable without touching code.

What Got Added in 2026

Whisper.cpp — Local Audio Transcription

brew install whisper-cpp
# or build from source for maximum performance

# Transcribe any audio file
whisper-cpp --model base.en audio.mp3

Use cases: meeting transcription, voice notes → text, local podcast search.

LiteLLM — The Unified Proxy

This one changed everything. LiteLLM sits in front of all your AI models and presents a single OpenAI-compatible API endpoint.

# All your apps point to one endpoint
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml

Now every app in your stack — n8n, Open WebUI, your scripts — uses http://litellm:4000 and you switch models in one config file.

ChromaDB + LlamaIndex — Private RAG

Search your own documents with AI. All local, all private.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Index your documents
docs = SimpleDirectoryReader('/your/docs/folder').load_data()
db = chromadb.PersistentClient(path='./chroma_db')
collection = db.get_or_create_collection('my_docs')
store = ChromaVectorStore(chroma_collection=collection)

# Query them
index = VectorStoreIndex.from_documents(docs, vector_store=store)
engine = index.as_query_engine()
response = engine.query('What did we decide about the API architecture?')
print(response)

The Hardware Question

Everyone asks: GPU server vs Apple Silicon?

In 2026, for pure AI inference at homelab scale: Apple Silicon wins on value.

M4 Mac mini (24GB, ~$800):

Runs 32B models at 10-15 tokens/sec
Silent, 30W idle
No separate GPU needed
macOS means easy maintenance

NVIDIA GPU server (RTX 4090, 24GB VRAM):

Faster inference on large batches
Better for fine-tuning
Loud, 450W under load
Linux-only for serious use

For a homelab running 1-5 concurrent users doing text tasks: Mac mini M4. For serious inference throughput or training: GPU server.

The Monitoring Stack

Don't run AI services without knowing when they break:

Uptime Kuma — check if Ollama/n8n/Open WebUI are responding
Netdata — per-container resource usage
Loki + Grafana — aggregate logs from all containers

# Add to any docker-compose.yml for log collection
labels:
  - logging=promtail
  - logging_jobname=containerlogs

What I'd Set Up First on a New Server

In order, if starting from scratch:

Traefik — reverse proxy + automatic HTTPS (everything else goes behind it)
Ollama — pull qwen2.5:14b first, add others as needed
Open WebUI — immediate usable interface
n8n — automation brain
LiteLLM — unified API proxy
Uptime Kuma — monitoring
Vaultwarden — password manager (you'll need it)

The One Thing Most People Miss

Running models locally is only half the value.

The other half is connecting them to your actual workflow — your email, your calendar, your codebase, your documents. A local LLM that just answers questions in a chat window is the same as a very slow, private version of ChatGPT.

A local LLM wired into n8n that automatically triages your email, monitors your servers, and summarizes your notes — that's actual leverage.

SIGNAL publishes weekly. Follow @signal-weekly for more practical builder content.

Next: How I use AI agents to automate the boring parts of running a homelab — specific n8n workflows, working code.

Top comments (1)

Yanko Alexandrov • Mar 24

Solid stack. One thing missing from the hardware section: the NVIDIA Jetson Orin Nano. It sits between a Pi and a Mac Mini — 67 TOPS of AI compute at 15W for around $249. Won't replace a Mac Mini for 32B models, but for always-on agent workloads (the "connecting to your workflow" part you nailed at the end), it's perfect. I run one 24/7 with OpenClaw as an AI assistant — Telegram automation, browser control, cron jobs, all at 15W. There's also ClawBox (openclawhardware.dev) which ships the Orin Nano + 512GB NVMe pre-configured if you want to skip the JetPack setup. Great article though — the LiteLLM tip is gold, switching to that for my proxy layer.