My homelab runs the usual stack — Sonarr, Radarr, Prowlarr, qBittorrent, Plex. I was getting ntfy alerts at all hours for things like ffprobe metadata reads and HTTP 429s from indexers. Not actionable, just noise.
So I built Cortex: a monitoring layer that sends Docker logs through a local LLM (Ollama) every 30 minutes, filters the noise, and routes only meaningful alerts to my phone.
The problem with threshold-based monitoring
Standard monitoring tools watch numbers. CPU > 80%? Alert. Disk > 90%? Alert. That works for infrastructure — it doesn't work for application logs.
A Sonarr log line like:
[Warn] NzbDrone.Core.Download.TrackedDownloads.TrackedDownloadService:
Couldn't import album track / No files found are eligible for import
Is that a problem? Maybe. Depends on context. Is it a one-off, or has it been happening for 6 hours? Is the download queue healthy? Did the episode actually get imported by another path?
A fixed threshold can't answer that. A language model can.
Architecture
Docker logs → Cortex → Ollama (local LLM) → parsed report → ntfy
↓
Prometheus metrics
Every 30 minutes, cortex-monitor.py runs via cron:
- Collects recent log lines from each monitored container
- Filters known noise patterns (ffprobe, VideoFileInfoReader, HTTP 429, etc.)
- Sends the filtered logs to a local Ollama endpoint
- Parses the LLM response into structured alerts
- Routes alerts by priority — INFO goes to the daily digest, WARNING/CRITICAL go to ntfy immediately
The Ollama Modelfile
The key is giving the LLM enough context to understand what it's reading. The Modelfile bakes in knowledge of the stack:
SYSTEM """
You are an infrastructure monitoring assistant for a self-hosted homelab.
You analyse log output from Docker containers running *arr media services.
NOISE — these are NOT alerts:
- ffprobe metadata reads
- VideoFileInfoReader routine scans
- HTTP 429 rate limiting from indexers (expected, indexers throttle)
- Prowlarr health check on port 9696
SIGNAL — these ARE worth reporting:
- Import failures after successful downloads
- Indexer connectivity issues lasting > 30 minutes
- Download client queue stalls
- Authentication errors
- Database errors
Output format:
ALERT_LEVEL: INFO|WARNING|CRITICAL
SUMMARY: one sentence
DETAIL: what happened and why it matters
RECOMMENDATION: what to check or do
"""
Temperature 0.2 keeps the output deterministic and consistent — you don't want creative variation in monitoring alerts.
Noise filtering before the LLM
The LLM call costs time (2-4 seconds on a local GPU). Filtering before sending keeps the context window clean and the latency low:
NOISE_PATTERNS = [
"ffprobe",
"VideoFileInfoReader",
"429",
"invalid torrent",
"9696/",
]
def filter_noise(log_lines: list) -> list:
return [
line for line in log_lines
if not any(pattern in line for pattern in NOISE_PATTERNS)
]
On a normal day, this drops 60-70% of log volume before it ever reaches Ollama.
Alert routing with cooldown
Not every WARNING needs an immediate ntfy push. Cortex uses a cooldown per alert type to avoid notification fatigue:
def route_alert(alert: dict, state: dict) -> bool:
key = f"{alert['container']}:{alert['alert_level']}"
last_sent = state.get(key, 0)
cooldown = COOLDOWNS.get(alert['alert_level'], 3600)
if time.time() - last_sent < cooldown:
return False # still in cooldown
state[key] = time.time()
return True
INFO alerts accumulate and go into the daily digest at 09:00. WARNING and CRITICAL bypass the cooldown and go out immediately.
The daily digest
Every morning at 09:00, cortex-digest.py sends a summary via ntfy:
📊 Cortex Daily Digest — 2026-04-17
Containers: 5/5 healthy
Alerts last 24h: 2 (1 WARNING, 1 INFO)
Noise filtered: 847 log entries
Top event: prowlarr indexer timeout on NZBgeek (non-critical)
Recommendation: check NZBgeek API key expiry
Imports: 4 episodes, 3 movies — all clean
One message per day with everything that actually happened. No alert fatigue.
Prometheus metrics
cortex-exporter.py exposes metrics on port 9192 for Grafana:
cortex_alerts_total
cortex_last_run_timestamp
cortex_containers_monitored
cortex_noise_filtered_total
cortex_digest_last_sent
The "last run age" gauge is particularly useful — if Cortex stops running, the gauge climbs and you get a Grafana alert.
Hardware requirements
-
CPU-only: 16GB RAM minimum — runs
qwen2.5:7badequately -
GPU: 8GB VRAM — runs
qwen2.5:14bcomfortably (recommended)
I run it on a machine with a modest GPU. The 30-minute cron cadence means inference load is negligible — one batch call every half hour, not a continuous service.
Getting started
git clone https://github.com/pdegidio/cortex-homelab.git
cd cortex-homelab
bash install.sh
The installer walks you through Ollama endpoint, ntfy config, container names, and cron setup. Done in ~15 minutes.
Full repo: github.com/pdegidio/cortex-homelab — MIT license.
What's your biggest source of homelab alert noise? I'm curious whether the noise filter patterns generalise beyond my stack or if everyone's list is completely different.
Top comments (0)