DEV Community: SIGNAL

Self-Host Your AI Code Assistant With Continue.dev + Ollama — VS Code Copilot Without the Subscription

SIGNAL — Wed, 25 Mar 2026 09:03:01 +0000

You're paying $19/month for GitHub Copilot. Your code is leaving your machine, hitting someone else's servers, and coming back as suggestions. It works. But you could also run the same workflow locally — for free, with full privacy, on hardware you probably already own.

This guide sets up Continue.dev with Ollama so you get AI code completion, chat, and refactoring directly in VS Code — no API keys, no subscriptions, no data leaving your network.

What You Need

A machine with 16GB+ RAM (Mac mini M-series is ideal, but any modern desktop works)
VS Code or a fork (Cursor users: you already have this built in, but keep reading for the self-hosted angle)
Docker (optional, for running Ollama in a container)
10 minutes

Step 1: Install Ollama

Ollama makes running local LLMs trivially simple. One binary, one command.

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or with Docker
docker run -d --name ollama \
  -p 11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama:latest

Pull a model that's good at code. For 16GB machines, deepseek-coder-v2:16b hits the sweet spot between quality and speed:

# Primary code model
ollama pull deepseek-coder-v2:16b

# Smaller alternative if RAM is tight
ollama pull codellama:7b

# For chat/explanations (optional)
ollama pull llama3.1:8b

Verify it works:

ollama run deepseek-coder-v2:16b "Write a Python function that retries HTTP requests with exponential backoff"

You should get a working function back in a few seconds.

Step 2: Install Continue.dev

Continue is an open-source AI code assistant that plugs into VS Code. It supports any OpenAI-compatible API — which Ollama exposes out of the box.

Open VS Code
Extensions → Search "Continue" → Install
You'll see a new sidebar icon (the Continue logo)

Or from the terminal:

code --install-extension continue.continue

Step 3: Configure Continue for Ollama

Open Continue's config. Press Cmd+Shift+P (or Ctrl+Shift+P) → "Continue: Open Config File". Replace or merge with:

{
  "models": [
    {
      "title": "DeepSeek Coder v2 (Local)",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b",
      "apiBase": "http://localhost:11434"
    },
    {
      "title": "Llama 3.1 Chat (Local)",
      "provider": "ollama",
      "model": "llama3.1:8b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek Autocomplete",
    "provider": "ollama",
    "model": "deepseek-coder-v2:16b",
    "apiBase": "http://localhost:11434"
  },
  "allowAnonymousTelemetry": false
}

The key settings:

models: What shows up in Continue's chat sidebar. You can add multiple and switch between them.
tabAutocompleteModel: The model that powers inline completions (the Copilot-like experience).
allowAnonymousTelemetry: Disable it. You're self-hosting for privacy — act like it.

Step 4: Use It Like Copilot

Once configured, you get three workflows:

Inline Completions

Just type. Continue will suggest completions as you code, exactly like Copilot. Press Tab to accept.

Chat

Open the Continue sidebar and ask questions about your code:

Explain what this function does and suggest improvements

You can highlight code, right-click → "Continue: Add to Chat" to give it context.

Refactoring

Highlight a block of code, press Cmd+I, and type what you want:

Refactor this to use async/await instead of callbacks

Continue rewrites the selection in-place with a diff view.

Step 5: Serve It Across Your Network

If Ollama runs on a dedicated machine (like a Mac mini), expose it to your local network:

# Set Ollama to listen on all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve

Or in Docker:

docker run -d --name ollama \
  -p 0.0.0.0:11434:11434 \
  -v ollama_data:/root/.ollama \
  ollama/ollama:latest

Then update Continue's config on your other machines to point at the server:

"apiBase": "http://192.168.1.100:11434"

Now every machine in your house gets AI code assistance, powered by one box. No internet required.

Performance Tips

Model selection matters more than hardware. A well-quantized 16B model on Apple Silicon will outperform a 70B model struggling on insufficient VRAM.

# Check what's loaded and how much memory it uses
ollama ps

# Pre-load your coding model so first completion is fast
curl http://localhost:11434/api/generate \
  -d '{"model": "deepseek-coder-v2:16b", "keep_alive": "24h"}'

Keep the model warm. By default, Ollama unloads models after 5 minutes of inactivity. The keep_alive parameter above keeps it in memory for 24 hours — instant completions all day.

Use different models for different tasks. Fast 7B model for autocomplete, larger 16B for chat and complex refactoring. Continue lets you configure this separately.

How Does It Compare to Copilot?

Honestly? For autocomplete, Copilot is still faster and more accurate — GitHub has the training data advantage. But the gap has narrowed dramatically. DeepSeek Coder v2 handles 80-90% of what Copilot suggests, and for the remaining 10%, the chat workflow compensates.

What you gain:

Zero data leaves your machine. Your proprietary code stays private.
No subscription. One-time hardware cost, then it's free forever.
Works offline. Airplane mode? Still coding with AI.
No rate limits. Hit it as hard as you want.
Full control. Swap models, tweak parameters, add custom prompts.

What's Next

Once this is running, you can extend it:

Add context providers in Continue to index your codebase (docs, Git history, file tree)
Set up a reverse proxy with auth if you want to expose it outside your LAN
Try Qwen2.5-Coder — another strong option that's gaining ground fast
Connect the same Ollama instance to other tools (n8n, Open WebUI, your own scripts)

The AI code assistant market is a $19/month tax on developers who don't know they can run the same thing locally. Now you know. Set it up once, and every keystroke stays yours.

Building things that run on your hardware? Follow @signal-weekly for practical homelab + AI guides every week.

Self-Host Langfuse With Docker — LLM Observability Without the Cloud Bill

SIGNAL — Mon, 23 Mar 2026 09:02:20 +0000

Why You Need LLM Observability

You're running Ollama, maybe piping it through LangChain or LiteLLM, and everything seems fine — until it isn't. A prompt that worked yesterday now returns garbage. Costs on your OpenAI fallback are creeping up. A chain silently swallows an error and returns a confident-sounding hallucination.

This is the observability gap. In traditional backend services, we have Grafana, Prometheus, Jaeger. For LLM workloads, we need something purpose-built: tracing every call, measuring latency and token usage, scoring output quality, and catching regressions before users do.

Langfuse is the open-source answer. MIT-licensed, self-hostable, and as of v3, genuinely production-ready with ClickHouse-backed analytics. Let's set it up.

What Langfuse Actually Does

Think of it as Jaeger for LLM calls. Every interaction becomes a trace — a tree of spans showing:

Which model was called, with what parameters
Input/output tokens and estimated cost
Latency per step in a chain
Custom scores (human feedback, LLM-as-judge, regex checks)
Prompt versions and A/B test metadata

It integrates natively with LangChain, LlamaIndex, LiteLLM, the OpenAI SDK, and Vercel AI. If you're building anything with LLMs, it probably fits.

Prerequisites

Docker and Docker Compose (v2)
A machine with at least 4 GB RAM (ClickHouse is hungry)
10 minutes

Step 1: Clone and Configure

git clone https://github.com/langfuse/langfuse.git
cd langfuse

Langfuse v3 ships a docker-compose.yml that bundles PostgreSQL, ClickHouse, Redis, MinIO, and the Langfuse web/worker services. Before starting, create a .env file:

# .env
NEXTAUTH_SECRET=$(openssl rand -base64 32)
SALT=$(openssl rand -base64 32)
ENCRYPTION_KEY=$(openssl rand -hex 32)
NEXTAUTH_URL=http://localhost:3000
LANGFUSE_INIT_ORG_ID=my-org
LANGFUSE_INIT_ORG_NAME=MyOrg
LANGFUSE_INIT_PROJECT_ID=my-project
LANGFUSE_INIT_PROJECT_NAME=default
LANGFUSE_INIT_PROJECT_PUBLIC_KEY=pk-lf-local
LANGFUSE_INIT_PROJECT_SECRET_KEY=sk-lf-local

The INIT variables auto-create your first org and project — no manual setup in the UI.

Step 2: Start It Up

docker compose up -d

Give it 30-60 seconds. ClickHouse takes a moment to initialize. Then open http://localhost:3000, create an admin account, and you're in.

Verify everything is healthy:

docker compose ps

You should see langfuse-web, langfuse-worker, postgres, clickhouse, redis, and minio all running.

Step 3: Instrument Your Code

Here's the payoff. Install the Python SDK:

pip install langfuse

Basic tracing with the OpenAI SDK

from langfuse.openai import openai
import os

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-local"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-local"
os.environ["LANGFUSE_HOST"] = "http://localhost:3000"

# This is a drop-in replacement — same API, automatic tracing
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain Docker volumes in one paragraph"}],
    metadata={"environment": "dev", "feature": "docs-assistant"}
)

print(response.choices[0].message.content)

That's it. One import change. Every call now appears in your Langfuse dashboard with full token counts, latency, and cost.

Tracing Ollama via LiteLLM

Running local models? Route through LiteLLM and Langfuse picks it up:

from langfuse.openai import openai

response = openai.chat.completions.create(
    model="ollama/llama3.2",
    messages=[{"role": "user", "content": "What is eBPF?"}],
    api_base="http://localhost:11434",
)

Now you can compare latency and quality between your local Llama and a cloud fallback — with actual data, not vibes.

Scoring Outputs

The killer feature. You can attach scores to any trace:

from langfuse import Langfuse

langfuse = Langfuse()

# After getting user feedback or running an eval
langfuse.score(
    trace_id="your-trace-id",
    name="helpfulness",
    value=0.9,
    comment="Accurate and concise"
)

Combine this with an LLM-as-judge pattern (have GPT-4o score your Llama outputs) and you've got automated quality monitoring.

Step 4: Set Up Alerts (Don't Skip This)

Langfuse doesn't have built-in alerting yet, but the data is in PostgreSQL and ClickHouse. A simple cron job covers the gap:

#!/bin/bash
# check-llm-errors.sh — alert on high error rate
ERROR_COUNT=$(docker exec langfuse-clickhouse-1 clickhouse-client \
  --query "SELECT count() FROM traces WHERE timestamp > now() - INTERVAL 1 HOUR AND level = 'ERROR'")

if [ "$ERROR_COUNT" -gt 10 ]; then
  curl -X POST "$WEBHOOK_URL" \
    -H 'Content-Type: application/json' \
    -d "{\"text\": \"Warning: LLM error spike: $ERROR_COUNT errors in the last hour\"}"
fi

What You Get

After a day of running this, you'll have:

Cost tracking — know exactly what each feature costs in tokens
Latency baselines — spot regressions when you change prompts or models
Quality scores — track output quality over time, not just "it works"
Prompt versioning — Langfuse's prompt management lets you A/B test without redeploying
Full data ownership — everything stays on your hardware

Resource Usage

On my homelab (Ryzen 5600G, 32 GB RAM), the full stack idles at about 1.5 GB RAM and negligible CPU. ClickHouse is the biggest consumer. For a solo dev or small team, any machine that can run Ollama can run Langfuse alongside it.

When NOT to Self-Host

Be honest with yourself:

If you're a solo dev experimenting, Langfuse Cloud's free tier (50K events/month) is perfectly fine
If you need SSO, audit logs, or SLA guarantees, the managed plan is worth it
If you don't want to maintain PostgreSQL backups, don't pretend you will

Self-hosting is for people who either need data sovereignty or enjoy running infrastructure. If that's you, Langfuse is one of the best-maintained open-source projects in the LLM tooling space.

Wrapping Up

LLM observability isn't optional anymore — not when models are non-deterministic, costs are real, and quality can silently degrade. Langfuse gives you production-grade tracing with a single import change and a docker compose up. That's a good trade.

The repo: github.com/langfuse/langfuse
Docs: langfuse.com/docs

SIGNAL is a weekly series for builders who self-host, automate, and actually ship. Published Mon/Wed/Fri.

Self-Host Paperless-ngx With Local AI — Private Documents, Better Search, Zero Cloud

SIGNAL — Fri, 20 Mar 2026 09:01:46 +0000

Your documents deserve better than a folder called Scans_2024_FINAL_v2.

Paperless-ngx has been the gold standard for self-hosted document management for a while now. But the 2026 version of this stack hits different — you can wire it up to a local LLM for automatic classification, smarter tagging, and search that actually understands what's in your documents. No cloud. No API fees. Everything stays on your hardware.

Here's how to set it up from scratch.

What You're Building

A Docker stack with three components:

Paperless-ngx — document ingestion, OCR, search, and web UI
Ollama — local LLM inference (we'll use mistral for classification)
A small Python classifier — bridges Paperless webhooks to Ollama for auto-tagging

The result: drop a PDF into a folder (or email it), and it gets OCR'd, classified by AI, tagged, and made searchable — all within your local network.

Prerequisites

Docker + Docker Compose
At least 8 GB RAM (16 GB recommended if running 7B+ models)
~10 GB disk for Ollama models
A machine that stays on (mini PC, NAS, old laptop — anything works)

Step 1: The Docker Compose Stack

Create a docker-compose.yml:

version: "3.8"

services:
  paperless-broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redis-data:/data

  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - paperless-broker
    ports:
      - "8000:8000"
    volumes:
      - paperless-data:/usr/src/paperless/data
      - paperless-media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume
      - ./export:/usr/src/paperless/export
    environment:
      PAPERLESS_REDIS: redis://paperless-broker:6379
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_POST_CONSUME_SCRIPT: /usr/src/paperless/scripts/classify.sh
      USERMAP_UID: 1000
      USERMAP_GID: 1000

  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama

volumes:
  redis-data:
  paperless-data:
  paperless-media:
  ollama-data:

Spin it up:

docker compose up -d

Then pull the model you'll use for classification:

docker exec -it ollama ollama pull mistral

Step 2: The AI Classification Script

This is where it gets interesting. Paperless-ngx supports post-consume scripts — code that runs after every document is ingested. We'll use this hook to send the extracted text to Ollama and get back structured tags.

Create scripts/classify.py:

#!/usr/bin/env python3
"""
Post-consume classifier for Paperless-ngx.
Sends document text to Ollama, gets back tags and document type.
"""

import json
import os
import sys
import urllib.request

OLLAMA_URL = os.getenv("OLLAMA_URL", "http://ollama:11434")
PAPERLESS_URL = os.getenv("PAPERLESS_URL", "http://localhost:8000")
PAPERLESS_TOKEN = os.getenv("PAPERLESS_TOKEN", "")

DOCUMENT_ID = os.getenv("DOCUMENT_ID")
DOCUMENT_FILE_NAME = os.getenv("DOCUMENT_FILE_NAME", "unknown")

SYSTEM_PROMPT = """You are a document classifier. Given the text of a document,
return a JSON object with:
- "tags": array of 1-4 relevant tags (e.g., "invoice", "medical", "tax", "receipt", "contract", "insurance")
- "doc_type": a single document type (e.g., "Invoice", "Letter", "Receipt", "Report", "Contract")
- "correspondent": who sent this document (company or person name), or null

Return ONLY valid JSON. No explanation."""


def classify_document(text: str) -> dict:
    """Send document text to Ollama for classification."""
    payload = json.dumps({
        "model": "mistral",
        "messages": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Classify this document:\n\n{text[:3000]}"}
        ],
        "stream": False,
        "format": "json"
    }).encode()

    req = urllib.request.Request(
        f"{OLLAMA_URL}/api/chat",
        data=payload,
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req, timeout=60) as resp:
        result = json.loads(resp.read())

    return json.loads(result["message"]["content"])


def apply_tags(doc_id: str, classification: dict):
    """Apply AI-generated tags back to Paperless via API."""
    tags = classification.get("tags", [])
    doc_type = classification.get("doc_type")

    for tag_name in tags:
        req = urllib.request.Request(
            f"{PAPERLESS_URL}/api/tags/?name__iexact={tag_name}",
            headers={"Authorization": f"Token {PAPERLESS_TOKEN}"},
        )
        with urllib.request.urlopen(req) as resp:
            existing = json.loads(resp.read())

        if existing["count"] == 0:
            create_payload = json.dumps({"name": tag_name}).encode()
            req = urllib.request.Request(
                f"{PAPERLESS_URL}/api/tags/",
                data=create_payload,
                headers={
                    "Authorization": f"Token {PAPERLESS_TOKEN}",
                    "Content-Type": "application/json",
                },
            )
            urllib.request.urlopen(req)

    print(f"[AI] Doc {doc_id}: type={doc_type}, tags={tags}")


if __name__ == "__main__":
    if not DOCUMENT_ID:
        print("No DOCUMENT_ID set, skipping classification")
        sys.exit(0)

    req = urllib.request.Request(
        f"{PAPERLESS_URL}/api/documents/{DOCUMENT_ID}/",
        headers={"Authorization": f"Token {PAPERLESS_TOKEN}"},
    )
    with urllib.request.urlopen(req) as resp:
        doc = json.loads(resp.read())

    text = doc.get("content", "")
    if len(text) < 50:
        print(f"[AI] Doc {DOCUMENT_ID}: too short, skipping")
        sys.exit(0)

    classification = classify_document(text)
    apply_tags(DOCUMENT_ID, classification)
    print(f"[AI] Classified: {DOCUMENT_FILE_NAME} -> {classification}")

And the shell wrapper scripts/classify.sh:

#!/bin/bash
# Post-consume hook for Paperless-ngx
export OLLAMA_URL="http://ollama:11434"
export PAPERLESS_URL="http://localhost:8000"
export PAPERLESS_TOKEN="${PAPERLESS_API_TOKEN}"

python3 /usr/src/paperless/scripts/classify.py

Make it executable:

chmod +x scripts/classify.sh scripts/classify.py

Step 3: Generate Your API Token

docker exec -it paperless-webserver-1 \
  python3 manage.py shell -c \
  "from rest_framework.authtoken.models import Token; \
   from django.contrib.auth.models import User; \
   u = User.objects.first(); \
   t, _ = Token.objects.get_or_create(user=u); \
   print(t.key)"

Add the token to your compose environment:

environment:
  PAPERLESS_API_TOKEN: "your-token-here"

Step 4: Test the Pipeline

Drop a PDF into the consume/ folder:

cp ~/Downloads/some-invoice.pdf ./consume/

Watch the logs:

docker compose logs -f paperless

You should see the document get ingested, OCR'd, then classified:

[AI] Classified: some-invoice.pdf -> {"tags": ["invoice", "utilities"], "doc_type": "Invoice", "correspondent": "Electric Company Inc"}

Making It Smarter

Swap the model. Mistral works fine for classification, but if you have the VRAM, try llama3:8b or phi3 for better accuracy on mixed-language documents.

Add a feedback loop. When you manually correct a tag in Paperless, log it. After enough corrections, you can fine-tune your prompt or switch to a specialized model.

Email ingestion. Paperless-ngx supports IMAP consumption out of the box:

environment:
  PAPERLESS_CONSUMER_ENABLE_IMAP: "true"
  PAPERLESS_CONSUMER_IMAP_HOST: "imap.example.com"
  PAPERLESS_CONSUMER_IMAP_USER: "docs@example.com"
  PAPERLESS_CONSUMER_IMAP_PASSWORD: "your-password"

Forward receipts and invoices to a dedicated email, and they land in Paperless, classified and tagged, without you lifting a finger.

Why Local Matters

Every time you upload a document to a cloud service, you're trusting someone else with your tax returns, medical records, and contracts. Running this locally means:

Zero data leaves your network — not even for OCR
No monthly fees — Ollama is free, Paperless is free
No rate limits — classify 1,000 documents at 3 AM if you want
Full control — swap models, change prompts, add custom logic

The hardware cost? A used mini PC with 16 GB RAM runs this stack comfortably. That's a one-time $150-200 investment vs. $10-20/month for cloud document management that still can't auto-tag your stuff.

What's Next

This is a foundation. From here you can add:

Semantic search with embeddings (pipe document text through nomic-embed-text and store vectors in pgvector)
Multi-language support by switching OCR languages and using multilingual models
Mobile scanning with apps that upload directly to your consume folder via WebDAV

The self-hosted document stack in 2026 is genuinely better than most paid alternatives. The AI layer just makes it unfair.

SIGNAL covers practical AI and self-hosting for builders. No hype, no fluff — just things that work.

Build a Local RAG Pipeline With Ollama + pgvector — No API Keys, No Cloud

SIGNAL — Wed, 18 Mar 2026 09:04:19 +0000

Retrieval-Augmented Generation is one of those ideas that sounds complex until you actually build it. At its core: shove documents into a vector database, embed a user query the same way, find the closest matches, and feed them to an LLM as context. That's it.

The problem? Most tutorials wire this to OpenAI embeddings and Pinecone, meaning you pay per token and your data leaves your machine. Let's fix that.

This guide builds a fully local RAG pipeline:

Ollama for the LLM and embeddings
PostgreSQL + pgvector as the vector store
Python to glue it together

100% offline. No API keys. No cloud.

What You Need

Docker (for Postgres + pgvector)
Ollama installed locally
Python 3.11+
~4 GB free RAM

Pull the models first:

ollama pull nomic-embed-text   # 274MB embedding model
ollama pull llama3.2           # ~2GB, fast on CPU

Step 1: Spin Up pgvector

pgvector adds a vector column type and similarity search operators to Postgres:

docker run -d \
  --name pgvector \
  -e POSTGRES_PASSWORD=localrag \
  -e POSTGRES_DB=ragdb \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Create the schema (connect via docker exec -it pgvector psql -U postgres -d ragdb):

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  source TEXT,
  chunk_index INT,
  content TEXT,
  embedding vector(768)
);

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

The ivfflat index makes similarity search fast even with thousands of chunks.

Step 2: Ingest Documents

pip install psycopg2-binary requests

# ingest.py
import os, sys, requests, psycopg2

OLLAMA_URL = "http://localhost:11434"
EMBED_MODEL = "nomic-embed-text"
CHUNK_SIZE = 400
CHUNK_OVERLAP = 50

DB = psycopg2.connect(
    host="localhost", port=5432,
    dbname="ragdb", user="postgres", password="localrag"
)

def embed(text):
    r = requests.post(f"{OLLAMA_URL}/api/embeddings",
        json={"model": EMBED_MODEL, "prompt": text})
    r.raise_for_status()
    return r.json()["embedding"]

def chunk(text):
    chunks, start = [], 0
    while start < len(text):
        chunks.append(text[start:start + CHUNK_SIZE])
        start += CHUNK_SIZE - CHUNK_OVERLAP
    return chunks

def ingest_file(path):
    text = open(path).read()
    chunks = chunk(text)
    cur = DB.cursor()
    for i, c in enumerate(chunks):
        cur.execute(
            "INSERT INTO documents (source, chunk_index, content, embedding) "
            "VALUES (%s, %s, %s, %s)",
            (path, i, c, embed(c))
        )
    DB.commit()
    print(f"  {path} -> {len(chunks)} chunks")

if __name__ == "__main__":
    folder = sys.argv[1] if len(sys.argv) > 1 else "."
    for root, _, files in os.walk(folder):
        for fname in files:
            if fname.endswith((".txt", ".md")):
                ingest_file(os.path.join(root, fname))

python ingest.py ./my-docs/

A 50-page doc takes about 30 seconds on a Mac mini M4.

Step 3: Query It

# query.py
import sys, requests, psycopg2

OLLAMA_URL = "http://localhost:11434"
EMBED_MODEL = "nomic-embed-text"
LLM_MODEL   = "llama3.2"
TOP_K = 4

DB = psycopg2.connect(
    host="localhost", port=5432,
    dbname="ragdb", user="postgres", password="localrag"
)

def embed(text):
    r = requests.post(f"{OLLAMA_URL}/api/embeddings",
        json={"model": EMBED_MODEL, "prompt": text})
    r.raise_for_status()
    return r.json()["embedding"]

def retrieve(question):
    vec = embed(question)
    cur = DB.cursor()
    cur.execute(
        "SELECT content FROM documents "
        "ORDER BY embedding <=> %s::vector LIMIT %s",
        (vec, TOP_K)
    )
    return [row[0] for row in cur.fetchall()]

def ask(question):
    context = "\n\n---\n\n".join(retrieve(question))
    prompt = (
        f"Answer using only the context below.\n\n"
        f"Context:\n{context}\n\n"
        f"Question: {question}\nAnswer:"
    )
    r = requests.post(f"{OLLAMA_URL}/api/generate",
        json={"model": LLM_MODEL, "prompt": prompt, "stream": False})
    r.raise_for_status()
    return r.json()["response"]

if __name__ == "__main__":
    print(ask(" ".join(sys.argv[1:])))

python query.py "What does the deployment guide say about rollbacks?"

The <=> operator is pgvector's cosine distance. ORDER BY ... LIMIT 4 pulls the 4 most relevant chunks and feeds them as context to the LLM.

Step 4: Expose It as an API

# server.py
from fastapi import FastAPI
from pydantic import BaseModel
from query import ask

app = FastAPI()

class Q(BaseModel):
    question: str

@app.post("/ask")
def ask_endpoint(q: Q):
    return {"answer": ask(q.question)}

pip install fastapi uvicorn
uvicorn server:app --port 8080

curl -s -X POST http://localhost:8080/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the backup procedures?"}' | jq .answer

Performance on Real Hardware

Mac mini M4, 16GB RAM:

Operation	Time
Embed 1000 chunks	~2 min
Query (embed + retrieve + generate)	3-5 sec
pgvector search on 10k rows	<50ms

For 100k+ chunks, switch to HNSW indexing:

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

HNSW has faster queries but slower builds and higher RAM usage. ivfflat is fine for most homelab workloads.

What to Do Next

Chunk smarter — use sentence boundaries instead of fixed character counts (nltk.sent_tokenize())
Hybrid search — combine pgvector with tsvector full-text search for better recall on exact terms
Re-ranking — after top-K retrieval, use a cross-encoder to re-rank before sending to the LLM
Source citations — the source column is already there; return it so you know which file the answer came from

The whole stack runs on hardware you probably already own. No API bills, no data leaving your network, no vendor lock-in.

pgvector + nomic-embed-text + llama3.2 = private RAG that actually works.

Build a Self-Hosted AI Code Review Bot With Ollama and Gitea Webhooks

SIGNAL — Mon, 16 Mar 2026 09:02:23 +0000

Every pull request deserves a second pair of eyes. But what if those eyes belong to a local LLM running on your own hardware — no API keys, no data leaving your network?

In this guide, we'll build a self-hosted AI code review bot that listens for Gitea pull request webhooks, sends diffs to Ollama, and posts review comments back — all running in Docker on your homelab.

Why Self-Hosted Code Review?

Cloud-based AI review tools are great, but they come with trade-offs:

Privacy: Your proprietary code goes to someone else's servers
Cost: API calls add up fast on large teams
Control: Rate limits, model changes, and outages are out of your hands

With Ollama + Gitea, you own the entire pipeline. Your code stays on your metal.

Architecture Overview

Gitea PR Event → Webhook → review-bot (Flask) → Ollama API → Gitea Comment API

Three moving parts:

Gitea sends a webhook when a PR is opened or updated
review-bot (a small Python Flask app) receives the webhook, fetches the diff, and sends it to Ollama
Ollama runs a code-capable model (we'll use codellama:13b) and returns the review
The bot posts the review as a PR comment via Gitea's API

Prerequisites

Docker and Docker Compose
A running Gitea instance (or spin one up — we'll include it in the compose file)
~16 GB RAM for codellama:13b (or use codellama:7b for lighter setups)

Step 1: Docker Compose Stack

Create docker-compose.yml:

version: '3.8'

services:
  gitea:
    image: gitea/gitea:latest
    ports:
      - '3000:3000'
    volumes:
      - gitea_data:/data
    environment:
      - GITEA__server__ROOT_URL=http://localhost:3000

  ollama:
    image: ollama/ollama:latest
    ports:
      - '11434:11434'
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

  review-bot:
    build: ./review-bot
    ports:
      - '5000:5000'
    environment:
      - OLLAMA_URL=http://ollama:11434
      - GITEA_URL=http://gitea:3000
      - GITEA_TOKEN=${GITEA_TOKEN}
      - MODEL=codellama:13b
    depends_on:
      - ollama
      - gitea

volumes:
  gitea_data:
  ollama_data:

Step 2: The Review Bot

Create review-bot/app.py:

import os
import requests
from flask import Flask, request, jsonify

app = Flask(__name__)

OLLAMA_URL = os.environ.get('OLLAMA_URL', 'http://localhost:11434')
GITEA_URL = os.environ.get('GITEA_URL', 'http://localhost:3000')
GITEA_TOKEN = os.environ['GITEA_TOKEN']
MODEL = os.environ.get('MODEL', 'codellama:13b')

SYSTEM_PROMPT = """You are a senior code reviewer. Analyze the following git diff and provide:
1. A brief summary of the changes
2. Potential bugs or issues
3. Suggestions for improvement
4. Security concerns if any

Be concise. Focus on what matters. Skip obvious stuff."""


def fetch_pr_diff(owner, repo, pr_number):
    url = f"{GITEA_URL}/api/v1/repos/{owner}/{repo}/pulls/{pr_number}"
    headers = {
        'Authorization': f'token {GITEA_TOKEN}',
        'Accept': 'application/diff'
    }
    resp = requests.get(url, headers=headers)
    resp.raise_for_status()
    return resp.text


def review_with_ollama(diff_text):
    if len(diff_text) > 8000:
        diff_text = diff_text[:8000] + '\n\n... (diff truncated)'

    resp = requests.post(f"{OLLAMA_URL}/api/generate", json={
        'model': MODEL,
        'system': SYSTEM_PROMPT,
        'prompt': f"Review this diff:\n\n```
{% endraw %}
diff\n{diff_text}\n
{% raw %}
```",
        'stream': False,
        'options': {'temperature': 0.3, 'num_predict': 1024}
    })
    resp.raise_for_status()
    return resp.json()['response']


def post_comment(owner, repo, pr_number, body):
    url = f"{GITEA_URL}/api/v1/repos/{owner}/{repo}/issues/{pr_number}/comments"
    headers = {'Authorization': f'token {GITEA_TOKEN}'}
    resp = requests.post(url, headers=headers, json={
        'body': f"🤖 **AI Code Review**\n\n{body}"
    })
    resp.raise_for_status()
    return resp.json()


@app.route('/webhook', methods=['POST'])
def handle_webhook():
    payload = request.json
    action = payload.get('action')
    if action not in ('opened', 'synchronized'):
        return jsonify({'status': 'skipped'}), 200

    pr = payload['pull_request']
    repo = payload['repository']
    owner = repo['owner']['login']
    repo_name = repo['name']
    pr_number = pr['number']

    print(f"Reviewing PR #{pr_number} in {owner}/{repo_name}...")

    try:
        diff = fetch_pr_diff(owner, repo_name, pr_number)
        review = review_with_ollama(diff)
        post_comment(owner, repo_name, pr_number, review)
        return jsonify({'status': 'reviewed'}), 200
    except Exception as e:
        print(f"Error: {e}")
        return jsonify({'status': 'error', 'message': str(e)}), 500


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

And review-bot/Dockerfile:

FROM python:3.12-slim
WORKDIR /app
RUN pip install flask requests
COPY app.py .
CMD ["python", "app.py"]

Step 3: Pull the Model

Before the bot can review anything, Ollama needs the model:

docker compose up -d ollama
docker compose exec ollama ollama pull codellama:13b

This downloads ~7 GB. Grab some coffee.

Step 4: Configure the Gitea Webhook

Go to your Gitea repo → Settings → Webhooks → Add Webhook → Gitea
Target URL: http://review-bot:5000/webhook
Trigger on: Pull Request Events
Content Type: application/json
Save

Since all services are on the same Docker network, Gitea can reach the bot by service name.

Step 5: Launch and Test

export GITEA_TOKEN=your_token_here
docker compose up -d

Open a PR in your Gitea repo. Within seconds, you'll see a comment from the bot with a structured code review.

Tuning Tips

Model selection matters. codellama:13b gives solid reviews. For faster but less detailed feedback, drop to codellama:7b. If you have 32+ GB RAM, codellama:34b is noticeably better at catching subtle bugs.

Temperature at 0.3 keeps reviews focused and deterministic. Bump to 0.5 for more creative suggestions.

Diff truncation at 8000 chars is conservative. If your model has a 16k context window, push higher — but watch for quality degradation on very long diffs.

Add file filtering to skip reviews on generated files:

SKIP_PATTERNS = ['package-lock.json', '.min.js', '.generated.']

def should_review(filename):
    return not any(p in filename for p in SKIP_PATTERNS)

What's Next?

This is a foundation. Ideas to build on:

Line-level comments using Gitea's review API instead of issue comments
Multiple models — route security-sensitive files to a specialized model
Review history — store past reviews in SQLite to track recurring issues
Slack/Discord notifications when a review finds critical issues

The Bottom Line

You don't need a $20/month SaaS subscription for AI code review. A spare machine with 16 GB of RAM, Docker, and about 100 lines of Python gives you a private, customizable review bot that runs on your terms.

The best part? Every review makes your code better, and none of your code leaves your network.

SIGNAL is a weekly series about building practical tools with AI — no hype, just stuff that works. Follow for more self-hosted builds.

Build an AI Agent Watchdog With Python and SQLite — Catch Silent Failures Before Users Do

SIGNAL — Fri, 13 Mar 2026 09:01:48 +0000

Everyone's deploying AI agents in 2026. Almost nobody is watching what those agents actually do.

I've been running local AI agents on my homelab for months — automating emails, summarizing docs, managing tasks. They work great. Until they don't. And when an agent silently starts hallucinating responses or hitting rate limits without retrying, you only find out when someone asks "why did you send that weird email?"

So I built a watchdog. It's ~120 lines of Python, stores everything in SQLite, and has saved me from at least three embarrassing agent failures. Here's how to build your own.

The Problem: Agents Fail Quietly

Traditional software crashes loudly — exceptions, error codes, stack traces. AI agents fail politely. They return confident-sounding garbage. They skip steps without complaining. They retry infinitely or not at all.

You need three things to catch this:

Structured logging of every agent action
Anomaly detection on response patterns
Alerts when something looks off

Step 1: The Action Logger

Every agent action goes through a single logging function. No exceptions.

import sqlite3
import json
import time
from pathlib import Path

DB_PATH = Path.home() / '.agent-watchdog' / 'actions.db'
DB_PATH.parent.mkdir(exist_ok=True)

def init_db():
    conn = sqlite3.connect(DB_PATH)
    conn.execute('''
        CREATE TABLE IF NOT EXISTS actions (
            id INTEGER PRIMARY KEY,
            timestamp REAL,
            agent TEXT,
            action TEXT,
            input_hash TEXT,
            output_length INTEGER,
            duration_ms REAL,
            status TEXT,
            metadata TEXT
        )
    ''')
    conn.execute('''
        CREATE INDEX IF NOT EXISTS idx_agent_time 
        ON actions(agent, timestamp)
    ''')
    conn.commit()
    return conn

def log_action(conn, agent, action, input_data, output, 
               duration_ms, status='ok', meta=None):
    import hashlib
    input_hash = hashlib.sha256(
        json.dumps(input_data, sort_keys=True).encode()
    ).hexdigest()[:16]

    conn.execute(
        'INSERT INTO actions VALUES (NULL,?,?,?,?,?,?,?,?)',
        (time.time(), agent, action, input_hash, 
         len(str(output)), duration_ms, status,
         json.dumps(meta or {}))
    )
    conn.commit()

The key insight: we hash inputs and measure output length. This lets us detect when the same input suddenly produces wildly different output sizes — a classic hallucination signal.

Step 2: Wrapping Your Agent Calls

Wrap every agent call with timing and logging:

import functools

def watched(agent_name):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            conn = init_db()
            start = time.time()
            status = 'ok'
            output = None
            try:
                output = func(*args, **kwargs)
                return output
            except Exception as e:
                status = f'error:{type(e).__name__}'
                raise
            finally:
                duration = (time.time() - start) * 1000
                log_action(
                    conn, agent_name, func.__name__,
                    {'args': str(args)[:200], 
                     'kwargs': str(kwargs)[:200]},
                    output or '', duration, status
                )
        return wrapper
    return decorator

# Usage:
@watched('email-agent')
def summarize_email(email_body: str) -> str:
    # your LLM call here
    return llm.complete(f'Summarize: {email_body}')

That's it. Every call gets logged with timing, status, and output size. Zero changes to your agent logic.

Step 3: The Anomaly Detector

Now the fun part. Run this periodically (I use cron every 15 minutes):

def check_anomalies(conn, window_hours=2):
    cutoff = time.time() - (window_hours * 3600)
    alerts = []

    # Check 1: Error rate spike
    rows = conn.execute('''
        SELECT agent,
               COUNT(*) as total,
               SUM(CASE WHEN status != 'ok' THEN 1 ELSE 0 END) as errors
        FROM actions WHERE timestamp > ?
        GROUP BY agent
    ''', (cutoff,)).fetchall()

    for agent, total, errors in rows:
        if total > 5 and errors / total > 0.3:
            alerts.append(
                f'🔴 {agent}: {errors}/{total} actions failed '
                f'({errors/total:.0%} error rate)'
            )

    # Check 2: Response size anomaly
    rows = conn.execute('''
        SELECT agent, action,
               AVG(output_length) as avg_len,
               MIN(output_length) as min_len,
               MAX(output_length) as max_len
        FROM actions 
        WHERE timestamp > ? AND status = 'ok'
        GROUP BY agent, action
        HAVING COUNT(*) > 3
    ''', (cutoff,)).fetchall()

    for agent, action, avg_len, min_len, max_len in rows:
        if max_len > avg_len * 5 or (avg_len > 100 and min_len < avg_len * 0.1):
            alerts.append(
                f'⚠️ {agent}.{action}: output size varies wildly '
                f'(min={min_len}, avg={avg_len:.0f}, max={max_len})'
            )

    # Check 3: Unusual latency
    rows = conn.execute('''
        SELECT agent, action,
               AVG(duration_ms) as avg_ms,
               MAX(duration_ms) as max_ms
        FROM actions
        WHERE timestamp > ? AND status = 'ok'
        GROUP BY agent, action
        HAVING COUNT(*) > 3
    ''', (cutoff,)).fetchall()

    for agent, action, avg_ms, max_ms in rows:
        if max_ms > avg_ms * 10:
            alerts.append(
                f'🐌 {agent}.{action}: latency spike '
                f'(avg={avg_ms:.0f}ms, max={max_ms:.0f}ms)'
            )

    return alerts

Three simple checks that catch most agent failures:

Error rate above 30% → something is broken
Output size variance → possible hallucination or empty responses
Latency spikes → rate limiting, timeouts, or upstream issues

Step 4: Alerting

Keep it simple. I push alerts to a Discord webhook, but a file works too:

import urllib.request

def send_alert(alerts, webhook_url=None):
    message = '\n'.join(alerts)
    print(f'[WATCHDOG] {message}')

    if webhook_url:
        payload = json.dumps({
            'content': f'🐕 **Agent Watchdog Alert**\n```
{% endraw %}
\n{message}\n
{% raw %}
```'
        }).encode()
        req = urllib.request.Request(
            webhook_url, data=payload,
            headers={'Content-Type': 'application/json'}
        )
        urllib.request.urlopen(req)

# Main loop
if __name__ == '__main__':
    conn = init_db()
    alerts = check_anomalies(conn)
    if alerts:
        send_alert(alerts, 
                   webhook_url=os.environ.get('DISCORD_WEBHOOK'))
    else:
        print('[WATCHDOG] All agents nominal ✅')

Putting It Together

Add to your crontab:

*/15 * * * * cd ~/agent-watchdog && python3 watchdog.py

Query your data anytime:

# Last 24h summary per agent
sqlite3 ~/.agent-watchdog/actions.db \
  "SELECT agent, COUNT(*), 
   SUM(CASE WHEN status='ok' THEN 1 ELSE 0 END) as ok,
   ROUND(AVG(duration_ms)) as avg_ms
   FROM actions 
   WHERE timestamp > unixepoch()-86400 
   GROUP BY agent"

What I've Caught So Far

An email agent returning empty summaries after an Ollama model update (output_length dropped to 0)
A task agent retrying the same failed API call 47 times in 10 minutes (error rate spike)
A summarizer taking 30+ seconds per call because the context window was accidentally set to 128k (latency alert)

All of these would have gone unnoticed for hours without the watchdog.

The Takeaway

AI agents are not fire-and-forget. They need the same monitoring discipline we learned (painfully) with microservices a decade ago. The good news: you don't need Datadog or Grafana to start. SQLite, Python, and 15 minutes of setup gets you 80% of the way there.

Start logging. Start watching. Your future self will thank you.

This is part of the SIGNAL Weekly series — practical takes on AI, automation, and building things that work. Follow for more.

Run Local AI Agents with n8n + Ollama in Docker — No API Keys, No Limits

SIGNAL — Wed, 11 Mar 2026 09:01:44 +0000

You've probably automated a webhook or two with n8n. But have you connected it to a local LLM that runs entirely on your own hardware — no OpenAI bill, no data leaving your network, no rate limits?

This is that article.

By the end you'll have a Docker Compose stack running:

n8n — the visual workflow engine
Ollama — local LLM runtime (runs Llama 3, Mistral, Qwen, etc.)
A working n8n workflow that uses a local AI model to summarize text, classify data, or answer questions

Total setup time: ~20 minutes on any machine with 8+ GB RAM.

Why This Combo Works

n8n's HTTP Request node can call any REST API. Ollama exposes a clean REST API on localhost:11434. So connecting them is just a matter of pointing n8n at Ollama's endpoint — no plugin, no special node needed.

The result: you get AI-powered automation that:

Runs offline
Has no per-request cost
Never sends your data to a third party
Can run 24/7 on a homelab box or spare PC

Step 1: Docker Compose Setup

Create a docker-compose.yml:

version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped
    # For GPU acceleration (optional):
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

  n8n:
    image: n8nio/n8n:latest
    container_name: n8n
    environment:
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=changeme
      - N8N_HOST=localhost
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - WEBHOOK_URL=http://localhost:5678/
    ports:
      - "5678:5678"
    volumes:
      - n8n_data:/home/node/.n8n
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:
  n8n_data:

Start it:

docker compose up -d

Step 2: Pull a Model into Ollama

Once the ollama container is running, pull a model. For most homelab hardware, llama3.2:3b is a good balance of speed and capability:

docker exec -it ollama ollama pull llama3.2:3b

For beefier machines (16+ GB RAM), try mistral:7b or qwen2.5:7b.

Verify it works:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama3.2:3b",
    "prompt": "Summarize this in one sentence: The sky is blue because of Rayleigh scattering.",
    "stream": false
  }'

You should get a JSON response with a response field. That's Ollama talking.

Step 3: Build the n8n Workflow

Open n8n at http://localhost:5678 (login: admin / changeme).

Create a new workflow with these nodes:

Node 1: Webhook (trigger)

Method: POST
Path: /summarize
Response mode: Last Node

This gives you a URL like http://localhost:5678/webhook/summarize to POST text to.

Node 2: HTTP Request (calls Ollama)

Method: POST
URL: http://ollama:11434/api/generate

Note: Inside Docker Compose, containers talk to each other by service name. So n8n reaches Ollama at http://ollama:11434, not localhost.

Body (JSON):

{
  "model": "llama3.2:3b",
  "prompt": "Summarize the following in 2-3 sentences:\n\n{{ $json.body.text }}",
  "stream": false
}

Node 3: Respond to Webhook

Response Body:

{
  "summary": "{{ $json.response }}"
}

Activate the workflow. Now test it:

curl -X POST http://localhost:5678/webhook/summarize \
  -H "Content-Type: application/json" \
  -d '{"text": "Docker is an open-source platform that automates the deployment of applications inside lightweight, portable containers. It has become the standard tool for packaging software in modern development pipelines."}'

Response:

{
  "summary": "Docker is an open-source containerization platform that packages applications for portable deployment. It is widely adopted in modern software development pipelines."
}

Local AI. Zero cloud. Done.

Step 4: Practical Workflow Ideas

Once the plumbing works, here's what you can actually build:

Email triage agent — Webhook receives email body → Ollama classifies as urgent/normal/spam → n8n routes accordingly (write to Notion, send Slack alert, etc.)

RSS summarizer — Schedule node fetches RSS feed → Loop over items → Ollama writes a one-line summary → Append to a daily digest file or send to your phone

Code review helper — GitHub webhook sends PR diff → Ollama reviews it → n8n posts a comment back to the PR via GitHub API

Log anomaly detector — n8n reads from a log file on a schedule → Sends batches to Ollama with "flag anything unusual" → Notifies you only when something interesting shows up

All of these run locally, handle your private data, and cost $0/month beyond electricity.

Performance Notes

On a Mac mini M4 (16 GB unified memory), llama3.2:3b responds in ~1-2 seconds per request. mistral:7b takes ~4-6 seconds. For async workflows that don't need real-time response, even slower is fine — n8n will just wait.

If you're on a GPU-equipped machine, uncomment the deploy block in the Compose file and you'll see 3-5x faster responses.

For production use, add a proper reverse proxy (Traefik or Caddy) in front of n8n and don't expose port 5678 directly.

The Takeaway

n8n + Ollama is one of those combinations that sounds more complicated than it is. Two Docker containers, one HTTP Request node, and you have a fully local AI automation engine.

No subscriptions. No terms of service to worry about. No vendor throttling your requests at 3 AM when your workflow actually runs.

The hard part isn't the setup — it's resisting the urge to automate everything once you realize how easy it is.

Want more homelab AI setups and builder-first automation guides? Follow SIGNAL on Dev.to — practical pieces, three times a week.

Build Your Own MCP Server in 30 Minutes — Give AI Real Access to Your Systems

SIGNAL — Mon, 09 Mar 2026 09:03:59 +0000

If you've been watching the AI tooling space for the past six months, you've probably noticed Model Context Protocol (MCP) quietly becoming the backbone of how developers give AI assistants real power. It's the difference between a chatbot that talks about your codebase and one that actually reads, searches, and acts on it.

The best part? You can build your own MCP server in about 30 minutes. Here's exactly how.

What Is MCP (and Why Should You Care)?

MCP is an open standard (originally by Anthropic, now broadly adopted) that defines how AI models talk to external tools and data sources. Think of it like a USB-C port for AI — instead of every app having its own proprietary integration, MCP gives you a universal protocol.

When your Claude Desktop, Cursor, VS Code, or any MCP-compatible client connects to an MCP server, it gains access to tools — functions the AI can call. Your server defines what those tools do. That could be reading files, querying a database, calling an API, running a script, or anything else you can code.

Client (Claude, Cursor, VS Code)
    │
    ▼ MCP Protocol (stdio / HTTP)
    │
MCP Server ──► Your Tools (files, DBs, APIs, scripts)

What We're Building

A lightweight MCP server in Python that exposes three practical tools:

read_file — Read any file from allowed directories
run_command — Execute whitelisted shell commands
search_logs — Grep through log files with pattern matching

Real homelab stuff. No fluff. Let's build it.

Prerequisites

Python 3.10+
pip install mcp (the official MCP SDK)
Claude Desktop or any MCP-compatible client

Step 1: Install the SDK

pip install mcp

That's it. The mcp package handles all the protocol boilerplate — you just define your tools.

Step 2: Write the Server

Create homelab_mcp.py:

import subprocess
import asyncio
from pathlib import Path
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent

app = Server("homelab-tools")

ALLOWED_DIRS = [
    Path.home() / "logs",
    Path("/var/log"),
    Path.home() / "projects",
]

def is_safe_path(path: str) -> bool:
    target = Path(path).resolve()
    return any(
        str(target).startswith(str(d)) for d in ALLOWED_DIRS
    )

@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="read_file",
            description="Read a file from allowed directories",
            inputSchema={
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Absolute file path"
                    }
                },
                "required": ["path"]
            }
        ),
        Tool(
            name="run_command",
            description="Run a whitelisted shell command",
            inputSchema={
                "type": "object",
                "properties": {
                    "command": {"type": "string"},
                    "args": {
                        "type": "array",
                        "items": {"type": "string"}
                    }
                },
                "required": ["command"]
            }
        ),
        Tool(
            name="search_logs",
            description="Search log files for a pattern",
            inputSchema={
                "type": "object",
                "properties": {
                    "log_path": {"type": "string"},
                    "pattern": {"type": "string"},
                    "lines": {"type": "integer", "default": 50}
                },
                "required": ["log_path", "pattern"]
            }
        ),
    ]

SAFE_COMMANDS = {"df", "free", "uptime", "docker", "ps"}

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "read_file":
        path = arguments["path"]
        if not is_safe_path(path):
            return [TextContent(
                type="text",
                text=f"Denied: {path} outside allowed dirs"
            )]
        content = Path(path).read_text(errors="replace")
        return [TextContent(type="text", text=content[:10000])]

    elif name == "run_command":
        cmd = arguments["command"]
        args = arguments.get("args", [])
        if cmd not in SAFE_COMMANDS:
            return [TextContent(
                type="text",
                text=f"'{cmd}' is not whitelisted"
            )]
        result = subprocess.run(
            [cmd] + args,
            capture_output=True, text=True, timeout=10
        )
        return [TextContent(
            type="text", text=result.stdout or result.stderr
        )]

    elif name == "search_logs":
        log_path = arguments["log_path"]
        pattern = arguments["pattern"]
        lines = arguments.get("lines", 50)
        if not is_safe_path(log_path):
            return [TextContent(type="text", text="Access denied")]
        result = subprocess.run(
            ["grep", "-n", pattern, log_path],
            capture_output=True, text=True, timeout=5
        )
        output = "\n".join(result.stdout.splitlines()[-lines:])
        return [TextContent(
            type="text", text=output or "No matches"
        )]

async def main():
    async with stdio_server() as streams:
        await app.run(
            *streams,
            app.create_initialization_options()
        )

if __name__ == "__main__":
    asyncio.run(main())

Step 3: Connect It to Claude Desktop

Open your Claude Desktop config file:

Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Add your server:

{
  "mcpServers": {
    "homelab": {
      "command": "python3",
      "args": ["/path/to/homelab_mcp.py"]
    }
  }
}

Restart Claude Desktop. You'll see a hammer icon in the chat — that's your MCP server connected and ready.

Step 4: Test It

Ask Claude something like:

"Check my disk usage and search ~/logs/app.log for any ERROR lines from today"

Claude will call run_command with df -h, then search_logs with your pattern — and synthesize the results into a coherent answer. No copy-pasting between terminals. No context switching.

Security: Don't Skip This

Three non-negotiable rules for any MCP server that touches your system:

1. Whitelist commands aggressively. The SAFE_COMMANDS set should stay minimal. Never include rm, curl, bash, or sh. If you need destructive ops, build a separate server with extra auth.

2. Sandbox file access. ALLOWED_DIRS is your perimeter. Keep it tight. Adding your entire home directory is asking for trouble.

3. Add auth for remote deployments. If you switch from stdio to HTTP transport (for remote clients), add bearer token verification:

import os

def verify_request(headers: dict):
    token = headers.get("Authorization", "")
    expected = f"Bearer {os.environ['MCP_SECRET']}"
    if token != expected:
        raise PermissionError("Unauthorized")

4. Log everything. Every tool call should be logged with timestamp, tool name, and arguments. When something goes wrong — and it will — you want the audit trail.

What to Build Next

Once the basics are working, the fun starts:

Docker tools — docker ps, docker logs, restart containers by name
Home Assistant bridge — query sensor states, trigger automations
Database reader — safe read-only queries to SQLite or Postgres
Git dashboard — git log, git diff, status across your repos
Network scanner — who's on your LAN right now?

The MCP ecosystem is still young. Anthropic's GitHub has 40+ reference servers (filesystem, GitHub, Postgres, Slack), but the most interesting homelab integrations haven't been built yet. That's the opportunity.

The Bigger Picture

MCP solves a problem that's been bugging builders since GPT-3: how do you give an AI real context about your systems without dumping everything into a prompt? The answer isn't bigger context windows — it's structured tool access with proper security boundaries.

A 100-line Python file and 5 minutes of config turns your AI assistant from a smart chatbot into something that can actually navigate your infrastructure. That's a meaningful upgrade.

Build your own server first. You'll understand the protocol better than any tutorial can teach, and you'll have something genuinely useful running on your homelab by lunch.

SIGNAL is a weekly dispatch for builders working at the intersection of AI, dev tools, and self-hosted infrastructure. New articles every Monday, Wednesday, and Friday.

Homelab AI stack 2026 — what to run and in what order

SIGNAL — Sun, 08 Mar 2026 19:08:10 +0000

TL;DR

Stop running your AI brain on someone else's servers.

Here's the exact stack I run on my homelab — in the order that actually makes sense to deploy it.

Why self-hosted AI in 2026?

The models crossed a threshold. qwen2.5:32b running locally on a decent machine beats GPT-3.5 on most developer tasks. It's free, private, offline, and you own every token.

Self-hosting your AI stack isn't a nerd flex anymore. It's good engineering hygiene. You wouldn't run prod on someone else's laptop. Why run your reasoning on their servers?

The Stack (in order)

1. Traefik — everything behind HTTPS first

Before anything else gets internet-exposed, Traefik goes in. Automatic TLS, reverse proxy, single entrypoint.

docker run -d \
  -p 80:80 -p 443:443 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  traefik:v3.0

Don't skip this step. Everything else sits behind it.

2. Ollama — your local LLM engine

curl -fsSL https://ollama.ai/install.sh | sh
ollama run qwen2.5:32b

Swap model names freely: gemma3, mistral, phi4, llama3.2. All free. No API key.

Minimum viable hardware: 16GB RAM for 7B models, 32GB+ for 32B. Apple Silicon M-series handles this well.

3. Open WebUI — the interface

ChatGPT-style interface that connects directly to Ollama. Supports multiple models, conversation history, document upload.

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

4. n8n — the automation brain

This is where local AI stops being a toy and becomes a workflow tool. n8n connects your LLM to everything: email, webhooks, APIs, databases, smart home.

docker run -d -p 5678:5678 \
  -v n8n_data:/home/node/.n8n \
  n8nio/n8n

One workflow that changed my setup: email arrives → n8n sends it to Ollama → Ollama categorizes and drafts a reply → I review. Zero cloud, full privacy.

5. LiteLLM — unified proxy

Once you have multiple models, LiteLLM gives you one OpenAI-compatible endpoint. Your apps stop caring which backend they hit.

model_list:
  - model_name: local-fast
    litellm_params:
      model: ollama/qwen2.5:7b
      api_base: http://localhost:11434
  - model_name: local-heavy
    litellm_params:
      model: ollama/qwen2.5:32b
      api_base: http://localhost:11434

The point

The local LLM alone is not the value. Connecting it to your workflow is.

Anyone can run ollama run llama3.2 and ask it questions. The interesting part is when your homelab starts doing things autonomously — reading your emails, monitoring your services, briefing you every morning — with no data leaving your network.

That's the stack that gets you there.

SIGNAL covers AI tools, automation and homelab — what actually works, tested on real hardware. No hype.

I Built a 500-Prompt AI Toolkit for Developers

SIGNAL — Fri, 06 Mar 2026 20:19:20 +0000

After months of copy-pasting the same prompts across projects, I finally organized everything into one place.

500 battle-tested AI prompts, 10 categories.

What is inside

Debugging, Docker, Homelab, AI agents, Writing, System design, SQL, Security, Local LLMs (Ollama), Productivity.

Why prompts still matter in 2026

Everyone has AI access. The gap is knowing what to ask.

A good prompt is the difference between generic code and a precise answer to your exact problem.

Three prompts, free

Debug any error:

You are a senior engineer. I have this error: [ERROR]. My stack: [STACK].
Give me: 1) Root cause 2) Fix 3) How to prevent it. Be direct.

Dockerfile security audit:

Audit this Dockerfile: root user, exposed secrets, missing health check.
Output: critical/medium/low findings with fixes. [DOCKERFILE]

Local LLM system prompt:

You are a senior DevOps engineer, 15 years experience.
Direct answers. No disclaimers. Say when you do not know.

Full pack is 29 USD one-time. No subscription. Instant download.

https://signalweekly.gumroad.com/l/signal-prompt-pack

I Built a 500-Prompt AI Toolkit for Developers — Here's What's Inside

SIGNAL — Fri, 06 Mar 2026 20:18:51 +0000

After months of copy-pasting the same prompts across projects, I finally organized everything into one place.

500 battle-tested AI prompts, organized into 10 categories.

What's inside

Every prompt is written for developers who actually build things — not marketing fluff.

Categories:

🐛 Debugging & code review — rubber duck that actually helps
🐳 Docker & containers — compose files, health checks, multi-stage builds
🏠 Homelab & self-hosting — Traefik, PiHole, monitoring stacks
🤖 AI agents & automation — n8n flows, cron logic, webhook handlers
✍️ Writing & docs — READMEs, changelogs, technical specs
🏗️ System design — architecture reviews, trade-off analysis
🗄️ Data & SQL — query optimization, schema design, migrations
🔒 Security hardening — audits, Dockerfile hardening, secret management
🧠 Local LLMs (Ollama) — prompts tuned for Mistral, Llama, Qwen
⚡ Productivity — daily planning, code reviews, PR descriptions

Why prompts still matter in 2026

Everyone has AI access. The gap is knowing what to ask.

A good prompt is the difference between "here's some code" and "here's why your Docker container exits with code 137, here's how to fix it, and here's how to prevent it."

Three prompts, free

Debug any error instantly:

You are a senior engineer. I have this error: [ERROR]. 
My stack: [STACK]. 
Give me: 1) Root cause, 2) Fix, 3) How to prevent it. Be direct.

Dockerfile security audit:

Audit this Dockerfile for security issues. Check: root user, exposed secrets, unnecessary packages, missing health check, image size. Output: critical/medium/low findings with fixes.
[DOCKERFILE]

Local LLM system prompt (Ollama):

You are a senior DevOps engineer with 15 years of experience. 
You give direct, opinionated answers. No disclaimers. No "it depends" without specifics.
When you don't know something, say so. Prefer battle-tested solutions over cutting-edge.

The full pack is $29 one-time — no subscription, instant download, free updates.

👉 signalweekly.gumroad.com/l/signal-prompt-pack

More prompts, tools and homelab deep-dives every week on SIGNAL.

Automate Docker Container Health Checks With a 50-Line Bash Script

SIGNAL — Fri, 06 Mar 2026 09:01:27 +0000

If you're running any self-hosted services — a media server, a VPN, a database, a reverse proxy — you already know the silent killer: a container that's technically "Up" but completely broken inside. docker ps shows green. Your users see errors. You find out an hour later.

This article shows you how to build a real health-check automation script that goes beyond docker ps and actually verifies your containers are working.

The Problem With `docker ps`

Docker's default status is based on process state, not service health. A container running Nginx can be "Up" while Nginx itself has crashed and the supervisor is spinning. A Postgres container can be "Up" while still initializing and refusing connections.

Docker has a built-in HEALTHCHECK instruction for Dockerfiles, but:

Most third-party images don't define one
It only checks per-container, not across your whole stack
It doesn't send you alerts

What we want: a script that polls all running containers, checks if they're healthy, and fires a notification if something's wrong.

The Script

Save this as ~/bin/docker-health-check.sh:

#!/usr/bin/env bash
# docker-health-check.sh — Check running containers and alert on issues
# Usage: ./docker-health-check.sh [--notify]

set -euo pipefail

NOTIFY=${1:-""}
FAILED=()
WARNING=()

# --- Helpers ---
log() { echo "[$(date +%H:%M:%S)] $*"; }

alert() {
  local msg="$1"
  log "ALERT: $msg"
  if [[ "$NOTIFY" == "--notify" ]] && command -v notify-send &>/dev/null; then
    notify-send "Docker Health" "$msg" --urgency=critical
  fi
  # Optional: pipe to curl for webhook (Slack, Discord, ntfy.sh)
  # curl -s -X POST "$WEBHOOK_URL" -d "{\"text\": \"$msg\"}" > /dev/null
}

# --- Check Docker is running ---
if ! docker info &>/dev/null; then
  alert "Docker daemon is not running!"
  exit 1
fi

# --- Iterate running containers ---
while IFS= read -r line; do
  NAME=$(echo "$line" | awk '{print $NF}')
  STATUS=$(echo "$line" | awk '{print $2}')
  HEALTH=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "$NAME" 2>/dev/null)

  log "$NAME — status: $STATUS, health: $HEALTH"

  case "$HEALTH" in
    unhealthy)
      FAILED+=("$NAME (unhealthy)")
      alert "Container $NAME is UNHEALTHY"
      ;;
    starting)
      WARNING+=("$NAME (still starting)")
      ;;
    none)
      # No HEALTHCHECK defined — check if process is exited
      if [[ "$STATUS" != "Up" ]]; then
        FAILED+=("$NAME (status: $STATUS)")
        alert "Container $NAME has unexpected status: $STATUS"
      fi
      ;;
  esac

done < <(docker ps --format '{{.Status}}\t{{.Names}}' | grep -v 'Exited')

# --- Summary ---
echo ""
if [[ ${#FAILED[@]} -gt 0 ]]; then
  log "FAILED containers: ${FAILED[*]}"
  exit 2
elif [[ ${#WARNING[@]} -gt 0 ]]; then
  log "WARNING containers: ${WARNING[*]}"
  exit 1
else
  log "All containers healthy ✓"
  exit 0
fi

Make it executable:

chmod +x ~/bin/docker-health-check.sh

Adding a HEALTHCHECK to Your Compose Services

For services you control, add a healthcheck block to your docker-compose.yml:

services:
  api:
    image: my-api:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

  postgres:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $POSTGRES_USER"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

Once these are in place, docker inspect <container> returns a real health status, and our script above can act on it.

Running It on a Schedule

Two options depending on your setup:

Option A: cron (classic)

crontab -e
# Check every 5 minutes, log to file
*/5 * * * * /home/you/bin/docker-health-check.sh --notify >> /var/log/docker-health.log 2>&1

Option B: systemd timer (more control)

Create /etc/systemd/system/docker-health.service:

[Unit]
Description=Docker Container Health Check
After=docker.service

[Service]
Type=oneshot
User=your-user
ExecStart=/home/you/bin/docker-health-check.sh --notify
StandardOutput=journal
StandardError=journal

And /etc/systemd/system/docker-health.timer:

[Unit]
Description=Run Docker health check every 5 minutes

[Timer]
OnBootSec=2min
OnUnitActiveSec=5min
Unit=docker-health.service

[Install]
WantedBy=timers.target

Enable it:

sudo systemctl daemon-reload
sudo systemctl enable --now docker-health.timer

# Verify
systemctl list-timers docker-health.timer

Push Alerts With ntfy.sh

For real-time push notifications to your phone, swap the alert function body for an ntfy.sh call:

alert() {
  local msg="$1"
  curl -s \
    -H "Title: Docker Health Alert" \
    -H "Priority: urgent" \
    -H "Tags: warning,whale" \
    -d "$msg" \
    https://ntfy.sh/your-private-topic-name > /dev/null
}

ntfy.sh is free, open-source, and you can self-host it too. Subscribe to your topic in the mobile app — done.

Real-World Usage Example

Here's what the output looks like on a typical homelab running 8 containers:

[09:15:02] nginx-proxy — status: Up 3 days, health: healthy
[09:15:02] immich-server — status: Up 3 days, health: healthy
[09:15:03] immich-db — status: Up 3 days, health: healthy
[09:15:03] vaultwarden — status: Up 3 days, health: none
[09:15:03] uptime-kuma — status: Up 3 days, health: none
[09:15:04] jellyfin — status: Up 3 days, health: healthy
[09:15:04] paperless-ngx — status: Up 2 hours, health: starting
[09:15:04] WARNING containers: paperless-ngx (still starting)

The starting state on paperless-ngx is fine — it just restarted. If it's still starting 15 minutes later, that's when you care.

What to Do When Something Fails

If the script exits with code 2 (FAILED), here's a quick triage sequence:

# What is the container's last health check output?
docker inspect --format='{{json .State.Health}}' <name> | jq '.Log[-1]'

# Last 50 lines of container logs
docker logs --tail 50 <name>

# Restart and watch
docker restart <name> && docker logs -f <name>

Summary

This whole setup takes about 10 minutes to deploy and will save you from discovering outages via angry users or missing the fact that a database has been silently crashing and restarting all night.

The script is intentionally simple — under 60 lines, no dependencies beyond bash and Docker. Extend it as you need: add HTTP endpoint checks, disk space warnings, or container CPU/memory thresholds with docker stats.

If you're self-hosting anything seriously, health-check automation is table stakes. Now you have no excuse.

SIGNAL is a weekly digest for builders. If you found this useful, check out the archive on Dev.to.