DEV Community

Cover image for I built an AI log monitor for my homelab — local LLM reads my *arr logs so I don't have to
Paolo D'Egidio
Paolo D'Egidio

Posted on

I built an AI log monitor for my homelab — local LLM reads my *arr logs so I don't have to

My homelab runs the usual stack — Sonarr, Radarr, Prowlarr, qBittorrent, Plex. I was getting ntfy alerts at all hours for things like ffprobe metadata reads and HTTP 429s from indexers. Not actionable, just noise.

So I built Cortex: a monitoring layer that sends Docker logs through a local LLM (Ollama) every 30 minutes, filters the noise, and routes only meaningful alerts to my phone.

The problem with threshold-based monitoring

Standard monitoring tools watch numbers. CPU > 80%? Alert. Disk > 90%? Alert. That works for infrastructure — it doesn't work for application logs.

A Sonarr log line like:

[Warn] NzbDrone.Core.Download.TrackedDownloads.TrackedDownloadService: 
Couldn't import album track / No files found are eligible for import
Enter fullscreen mode Exit fullscreen mode

Is that a problem? Maybe. Depends on context. Is it a one-off, or has it been happening for 6 hours? Is the download queue healthy? Did the episode actually get imported by another path?

A fixed threshold can't answer that. A language model can.

Architecture

Docker logs → Cortex → Ollama (local LLM) → parsed report → ntfy
                                                    ↓
                                           Prometheus metrics
Enter fullscreen mode Exit fullscreen mode

Every 30 minutes, cortex-monitor.py runs via cron:

  1. Collects recent log lines from each monitored container
  2. Filters known noise patterns (ffprobe, VideoFileInfoReader, HTTP 429, etc.)
  3. Sends the filtered logs to a local Ollama endpoint
  4. Parses the LLM response into structured alerts
  5. Routes alerts by priority — INFO goes to the daily digest, WARNING/CRITICAL go to ntfy immediately

The Ollama Modelfile

The key is giving the LLM enough context to understand what it's reading. The Modelfile bakes in knowledge of the stack:

SYSTEM """
You are an infrastructure monitoring assistant for a self-hosted homelab.
You analyse log output from Docker containers running *arr media services.

NOISE — these are NOT alerts:
- ffprobe metadata reads
- VideoFileInfoReader routine scans  
- HTTP 429 rate limiting from indexers (expected, indexers throttle)
- Prowlarr health check on port 9696

SIGNAL — these ARE worth reporting:
- Import failures after successful downloads
- Indexer connectivity issues lasting > 30 minutes
- Download client queue stalls
- Authentication errors
- Database errors

Output format:
ALERT_LEVEL: INFO|WARNING|CRITICAL
SUMMARY: one sentence
DETAIL: what happened and why it matters
RECOMMENDATION: what to check or do
"""
Enter fullscreen mode Exit fullscreen mode

Temperature 0.2 keeps the output deterministic and consistent — you don't want creative variation in monitoring alerts.

Noise filtering before the LLM

The LLM call costs time (2-4 seconds on a local GPU). Filtering before sending keeps the context window clean and the latency low:

NOISE_PATTERNS = [
    "ffprobe",
    "VideoFileInfoReader", 
    "429",
    "invalid torrent",
    "9696/",
]

def filter_noise(log_lines: list) -> list:
    return [
        line for line in log_lines
        if not any(pattern in line for pattern in NOISE_PATTERNS)
    ]
Enter fullscreen mode Exit fullscreen mode

On a normal day, this drops 60-70% of log volume before it ever reaches Ollama.

Alert routing with cooldown

Not every WARNING needs an immediate ntfy push. Cortex uses a cooldown per alert type to avoid notification fatigue:

def route_alert(alert: dict, state: dict) -> bool:
    key = f"{alert['container']}:{alert['alert_level']}"
    last_sent = state.get(key, 0)
    cooldown = COOLDOWNS.get(alert['alert_level'], 3600)

    if time.time() - last_sent < cooldown:
        return False  # still in cooldown

    state[key] = time.time()
    return True
Enter fullscreen mode Exit fullscreen mode

INFO alerts accumulate and go into the daily digest at 09:00. WARNING and CRITICAL bypass the cooldown and go out immediately.

The daily digest

Every morning at 09:00, cortex-digest.py sends a summary via ntfy:

📊 Cortex Daily Digest — 2026-04-17

Containers: 5/5 healthy
Alerts last 24h: 2 (1 WARNING, 1 INFO)
Noise filtered: 847 log entries

Top event: prowlarr indexer timeout on NZBgeek (non-critical)
Recommendation: check NZBgeek API key expiry

Imports: 4 episodes, 3 movies — all clean
Enter fullscreen mode Exit fullscreen mode

One message per day with everything that actually happened. No alert fatigue.

Prometheus metrics

cortex-exporter.py exposes metrics on port 9192 for Grafana:

cortex_alerts_total
cortex_last_run_timestamp
cortex_containers_monitored
cortex_noise_filtered_total
cortex_digest_last_sent
Enter fullscreen mode Exit fullscreen mode

The "last run age" gauge is particularly useful — if Cortex stops running, the gauge climbs and you get a Grafana alert.

Hardware requirements

  • CPU-only: 16GB RAM minimum — runs qwen2.5:7b adequately
  • GPU: 8GB VRAM — runs qwen2.5:14b comfortably (recommended)

I run it on a machine with a modest GPU. The 30-minute cron cadence means inference load is negligible — one batch call every half hour, not a continuous service.

Getting started

git clone https://github.com/pdegidio/cortex-homelab.git
cd cortex-homelab
bash install.sh
Enter fullscreen mode Exit fullscreen mode

The installer walks you through Ollama endpoint, ntfy config, container names, and cron setup. Done in ~15 minutes.

Full repo: github.com/pdegidio/cortex-homelab — MIT license.


What's your biggest source of homelab alert noise? I'm curious whether the noise filter patterns generalise beyond my stack or if everyone's list is completely different.

Top comments (0)