DEV Community

SIGNAL
SIGNAL

Posted on

The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

The Homelab AI Stack in 2026: What Self-Hosters Are Actually Running

SIGNAL — Weekly intelligence for builders


Spend five minutes on r/selfhosted and you'll notice: the conversations have changed.

Two years ago everyone asked "what should I run?" Now they're sharing sophisticated stacks that rival small business infrastructure. The self-hosting AI movement has matured. Here's what's actually worth deploying in 2026.

The Core Stack (What Stayed)

Ollama — Local LLM Runtime

Ollama won. It beat LocalAI on simplicity, beat llama.cpp on UX, and the model library makes pulling new models trivial.

# Install
curl -fsSL https://ollama.ai/install.sh | sh

# Pull the best value model for 16GB RAM
ollama pull qwen2.5:14b

# Or for 24GB+ (M4 Mac mini, high-RAM PC)
ollama pull qwen2.5:32b

# Test immediately
ollama run qwen2.5:14b "Explain what makes a good Docker Compose file"
Enter fullscreen mode Exit fullscreen mode

Hardware reality check:

  • 8GB RAM → 7B models, basic tasks
  • 16GB RAM → 14B models, solid capability
  • 24GB RAM (M4 Mac mini sweet spot) → 32B models, near GPT-4 quality
  • 32GB+ → 70B models, excellent for everything

Open WebUI — The Interface

Deploys in 2 minutes, gives you a ChatGPT-equivalent interface locally:

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

volumes:
  open-webui:
Enter fullscreen mode Exit fullscreen mode

n8n — Automation Brain

For connecting AI to everything else. Self-hosted, no per-workflow limits, full control.

The killer use case in 2026: n8n + Ollama = private AI automations that cost $0/month to run.

My actual running workflows:

  • Gmail → Ollama triage → priority flag → Telegram alert
  • RSS feeds → Ollama summary → daily digest at 7am
  • Server logs → Ollama anomaly check → alert if weird

What Got Replaced in 2026

LocalAI → Ollama

LocalAI was great for compatibility. Ollama is better for everything else.

Flowise → n8n

Flowise is excellent for RAG pipelines. But n8n handles AI and everything else — one tool beats two.

Custom Python scripts → n8n workflows

Maintainability. A n8n workflow is inspectable, editable, debuggable without touching code.


What Got Added in 2026

Whisper.cpp — Local Audio Transcription

brew install whisper-cpp
# or build from source for maximum performance

# Transcribe any audio file
whisper-cpp --model base.en audio.mp3
Enter fullscreen mode Exit fullscreen mode

Use cases: meeting transcription, voice notes → text, local podcast search.

LiteLLM — The Unified Proxy

This one changed everything. LiteLLM sits in front of all your AI models and presents a single OpenAI-compatible API endpoint.

# All your apps point to one endpoint
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    volumes:
      - ./litellm_config.yaml:/app/config.yaml
Enter fullscreen mode Exit fullscreen mode

Now every app in your stack — n8n, Open WebUI, your scripts — uses http://litellm:4000 and you switch models in one config file.

ChromaDB + LlamaIndex — Private RAG

Search your own documents with AI. All local, all private.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Index your documents
docs = SimpleDirectoryReader('/your/docs/folder').load_data()
db = chromadb.PersistentClient(path='./chroma_db')
collection = db.get_or_create_collection('my_docs')
store = ChromaVectorStore(chroma_collection=collection)

# Query them
index = VectorStoreIndex.from_documents(docs, vector_store=store)
engine = index.as_query_engine()
response = engine.query('What did we decide about the API architecture?')
print(response)
Enter fullscreen mode Exit fullscreen mode

The Hardware Question

Everyone asks: GPU server vs Apple Silicon?

In 2026, for pure AI inference at homelab scale: Apple Silicon wins on value.

M4 Mac mini (24GB, ~$800):

  • Runs 32B models at 10-15 tokens/sec
  • Silent, 30W idle
  • No separate GPU needed
  • macOS means easy maintenance

NVIDIA GPU server (RTX 4090, 24GB VRAM):

  • Faster inference on large batches
  • Better for fine-tuning
  • Loud, 450W under load
  • Linux-only for serious use

For a homelab running 1-5 concurrent users doing text tasks: Mac mini M4. For serious inference throughput or training: GPU server.


The Monitoring Stack

Don't run AI services without knowing when they break:

  • Uptime Kuma — check if Ollama/n8n/Open WebUI are responding
  • Netdata — per-container resource usage
  • Loki + Grafana — aggregate logs from all containers
# Add to any docker-compose.yml for log collection
labels:
  - logging=promtail
  - logging_jobname=containerlogs
Enter fullscreen mode Exit fullscreen mode

What I'd Set Up First on a New Server

In order, if starting from scratch:

  1. Traefik — reverse proxy + automatic HTTPS (everything else goes behind it)
  2. Ollama — pull qwen2.5:14b first, add others as needed
  3. Open WebUI — immediate usable interface
  4. n8n — automation brain
  5. LiteLLM — unified API proxy
  6. Uptime Kuma — monitoring
  7. Vaultwarden — password manager (you'll need it)

The One Thing Most People Miss

Running models locally is only half the value.

The other half is connecting them to your actual workflow — your email, your calendar, your codebase, your documents. A local LLM that just answers questions in a chat window is the same as a very slow, private version of ChatGPT.

A local LLM wired into n8n that automatically triages your email, monitors your servers, and summarizes your notes — that's actual leverage.


SIGNAL publishes weekly. Follow @signal-weekly for more practical builder content.

Next: How I use AI agents to automate the boring parts of running a homelab — specific n8n workflows, working code.

Top comments (0)