<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Paolo D'Egidio</title>
    <description>The latest articles on DEV Community by Paolo D'Egidio (@pdegidio).</description>
    <link>https://dev.to/pdegidio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3885199%2Fdb5bb172-fa6c-4742-9e7e-48f49b44133f.png</url>
      <title>DEV Community: Paolo D'Egidio</title>
      <link>https://dev.to/pdegidio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pdegidio"/>
    <language>en</language>
    <item>
      <title>I built an AI log monitor for my homelab — local LLM reads my *arr logs so I don't have to</title>
      <dc:creator>Paolo D'Egidio</dc:creator>
      <pubDate>Fri, 17 Apr 2026 21:35:43 +0000</pubDate>
      <link>https://dev.to/pdegidio/i-built-an-ai-log-monitor-for-my-homelab-local-llm-reads-my-arr-logs-so-i-dont-have-to-503n</link>
      <guid>https://dev.to/pdegidio/i-built-an-ai-log-monitor-for-my-homelab-local-llm-reads-my-arr-logs-so-i-dont-have-to-503n</guid>
      <description>&lt;p&gt;My homelab runs the usual stack — Sonarr, Radarr, Prowlarr, qBittorrent, Plex. I was getting ntfy alerts at all hours for things like ffprobe metadata reads and HTTP 429s from indexers. Not actionable, just noise.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Cortex&lt;/strong&gt;: a monitoring layer that sends Docker logs through a local LLM (Ollama) every 30 minutes, filters the noise, and routes only meaningful alerts to my phone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with threshold-based monitoring
&lt;/h2&gt;

&lt;p&gt;Standard monitoring tools watch numbers. CPU &amp;gt; 80%? Alert. Disk &amp;gt; 90%? Alert. That works for infrastructure — it doesn't work for application logs.&lt;/p&gt;

&lt;p&gt;A Sonarr log line like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Warn] NzbDrone.Core.Download.TrackedDownloads.TrackedDownloadService: 
Couldn't import album track / No files found are eligible for import
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Is that a problem? Maybe. Depends on context. Is it a one-off, or has it been happening for 6 hours? Is the download queue healthy? Did the episode actually get imported by another path?&lt;/p&gt;

&lt;p&gt;A fixed threshold can't answer that. A language model can.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Docker logs → Cortex → Ollama (local LLM) → parsed report → ntfy
                                                    ↓
                                           Prometheus metrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every 30 minutes, &lt;code&gt;cortex-monitor.py&lt;/code&gt; runs via cron:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Collects recent log lines from each monitored container&lt;/li&gt;
&lt;li&gt;Filters known noise patterns (ffprobe, VideoFileInfoReader, HTTP 429, etc.)&lt;/li&gt;
&lt;li&gt;Sends the filtered logs to a local Ollama endpoint&lt;/li&gt;
&lt;li&gt;Parses the LLM response into structured alerts&lt;/li&gt;
&lt;li&gt;Routes alerts by priority — INFO goes to the daily digest, WARNING/CRITICAL go to ntfy immediately&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Ollama Modelfile
&lt;/h2&gt;

&lt;p&gt;The key is giving the LLM enough context to understand what it's reading. The Modelfile bakes in knowledge of the stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SYSTEM """
You are an infrastructure monitoring assistant for a self-hosted homelab.
You analyse log output from Docker containers running *arr media services.

NOISE — these are NOT alerts:
- ffprobe metadata reads
- VideoFileInfoReader routine scans  
- HTTP 429 rate limiting from indexers (expected, indexers throttle)
- Prowlarr health check on port 9696

SIGNAL — these ARE worth reporting:
- Import failures after successful downloads
- Indexer connectivity issues lasting &amp;gt; 30 minutes
- Download client queue stalls
- Authentication errors
- Database errors

Output format:
ALERT_LEVEL: INFO|WARNING|CRITICAL
SUMMARY: one sentence
DETAIL: what happened and why it matters
RECOMMENDATION: what to check or do
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Temperature 0.2 keeps the output deterministic and consistent — you don't want creative variation in monitoring alerts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Noise filtering before the LLM
&lt;/h2&gt;

&lt;p&gt;The LLM call costs time (2-4 seconds on a local GPU). Filtering before sending keeps the context window clean and the latency low:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;NOISE_PATTERNS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffprobe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VideoFileInfoReader&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;429&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invalid torrent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9696/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filter_noise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;log_lines&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;NOISE_PATTERNS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a normal day, this drops 60-70% of log volume before it ever reaches Ollama.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alert routing with cooldown
&lt;/h2&gt;

&lt;p&gt;Not every WARNING needs an immediate ntfy push. Cortex uses a cooldown per alert type to avoid notification fatigue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;container&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alert_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;last_sent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cooldown&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;COOLDOWNS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;alert_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last_sent&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;cooldown&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;  &lt;span class="c1"&gt;# still in cooldown
&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;INFO alerts accumulate and go into the daily digest at 09:00. WARNING and CRITICAL bypass the cooldown and go out immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The daily digest
&lt;/h2&gt;

&lt;p&gt;Every morning at 09:00, &lt;code&gt;cortex-digest.py&lt;/code&gt; sends a summary via ntfy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📊 Cortex Daily Digest — 2026-04-17

Containers: 5/5 healthy
Alerts last 24h: 2 (1 WARNING, 1 INFO)
Noise filtered: 847 log entries

Top event: prowlarr indexer timeout on NZBgeek (non-critical)
Recommendation: check NZBgeek API key expiry

Imports: 4 episodes, 3 movies — all clean
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One message per day with everything that actually happened. No alert fatigue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prometheus metrics
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;cortex-exporter.py&lt;/code&gt; exposes metrics on port 9192 for Grafana:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight prometheus"&gt;&lt;code&gt;&lt;span class="n"&gt;cortex_alerts_total&lt;/span&gt;
&lt;span class="n"&gt;cortex_last_run_timestamp&lt;/span&gt;
&lt;span class="n"&gt;cortex_containers_monitored&lt;/span&gt;
&lt;span class="n"&gt;cortex_noise_filtered_total&lt;/span&gt;
&lt;span class="n"&gt;cortex_digest_last_sent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "last run age" gauge is particularly useful — if Cortex stops running, the gauge climbs and you get a Grafana alert.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware requirements
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU-only:&lt;/strong&gt; 16GB RAM minimum — runs &lt;code&gt;qwen2.5:7b&lt;/code&gt; adequately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU:&lt;/strong&gt; 8GB VRAM — runs &lt;code&gt;qwen2.5:14b&lt;/code&gt; comfortably (recommended)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I run it on a machine with a modest GPU. The 30-minute cron cadence means inference load is negligible — one batch call every half hour, not a continuous service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/pdegidio/cortex-homelab.git
&lt;span class="nb"&gt;cd &lt;/span&gt;cortex-homelab
bash install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer walks you through Ollama endpoint, ntfy config, container names, and cron setup. Done in ~15 minutes.&lt;/p&gt;

&lt;p&gt;Full repo: &lt;a href="https://github.com/pdegidio/cortex-homelab" rel="noopener noreferrer"&gt;github.com/pdegidio/cortex-homelab&lt;/a&gt; — MIT license.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's your biggest source of homelab alert noise? I'm curious whether the noise filter patterns generalise beyond my stack or if everyone's list is completely different.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>homelab</category>
      <category>selfhosted</category>
      <category>docker</category>
      <category>python</category>
    </item>
    <item>
      <title>How to get individual SMART data from a TerraMaster DAS (and build a failure forecaster around it)</title>
      <dc:creator>Paolo D'Egidio</dc:creator>
      <pubDate>Fri, 17 Apr 2026 21:27:29 +0000</pubDate>
      <link>https://dev.to/pdegidio/how-to-get-individual-smart-data-from-a-terramaster-das-and-build-a-failure-forecaster-around-it-5b2j</link>
      <guid>https://dev.to/pdegidio/how-to-get-individual-smart-data-from-a-terramaster-das-and-build-a-failure-forecaster-around-it-5b2j</guid>
      <description>&lt;p&gt;If you have a TerraMaster D5-300 — or any DAS with a JMicron JMB576 bridge chip — you've probably hit this wall:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;smartctl &lt;span class="nt"&gt;-a&lt;/span&gt; /dev/sdb
...
SMART overall-health self-assessment &lt;span class="nb"&gt;test &lt;/span&gt;result: PASSED
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One result. For the whole enclosure. Not the 5 individual WD Red Pros inside it.&lt;/p&gt;

&lt;p&gt;Most SMART monitoring guides stop here and tell you to use the vendor app. I went a different way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: JMicron pass-through
&lt;/h2&gt;

&lt;p&gt;The JMB576 chip supports SMART pass-through via a specific &lt;code&gt;smartctl&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;smartctl &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; jmb39x,0 /dev/sdb   &lt;span class="c"&gt;# slot 1&lt;/span&gt;
smartctl &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; jmb39x,1 /dev/sdb   &lt;span class="c"&gt;# slot 2&lt;/span&gt;
smartctl &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; jmb39x,2 /dev/sdb   &lt;span class="c"&gt;# slot 3&lt;/span&gt;
smartctl &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; jmb39x,3 /dev/sdb   &lt;span class="c"&gt;# slot 4&lt;/span&gt;
smartctl &lt;span class="nt"&gt;-a&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; jmb39x,4 /dev/sdb   &lt;span class="c"&gt;# slot 5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;N&lt;/code&gt; in &lt;code&gt;jmb39x,N&lt;/code&gt; maps directly to the physical slot. Run this and you get full per-disk SMART output — temperature, reallocated sectors, CRC errors, pending sectors, everything.&lt;/p&gt;

&lt;p&gt;Tested on: TerraMaster D5-300, Debian 12.5, smartctl 7.3.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;With per-disk SMART data, you can actually monitor disk health instead of just hoping the enclosure is OK. But raw SMART numbers aren't enough — what you really want is &lt;em&gt;trends&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A single &lt;code&gt;udma_crc_error_count = 1&lt;/code&gt; on a 4-year-old disk is probably fine. &lt;code&gt;udma_crc_error_count&lt;/code&gt; going from 1 to 3 to 7 to 12 over 60 days is not fine — it's a cable or backplane issue that will get worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a forecaster
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Argus&lt;/strong&gt; around this discovery. It collects SMART attributes every 6 hours, builds a 180-day rolling history, and runs a linear regression over the last 30 days to forecast when each attribute will hit its critical threshold.&lt;/p&gt;

&lt;p&gt;The output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;👁️  Argus SMART Analysis — 2026-04-17T09:00:00+00:00
   Samples: 28 (forecast window: 30d)
   Overall: WARNING

✅ ssd-system (SanDisk Ultra II 960GB)   health=95/100  status=OK
✅ das-slot1  (WD Red Pro 8TB)           health=100/100 status=OK
✅ das-slot2  (WD Red Pro 8TB)           health=100/100 status=OK
✅ das-slot3  (WD Red Pro 8TB)           health=100/100 status=OK
✅ das-slot4  (WD Red Pro 8TB)           health=100/100 status=OK
🟡 das-slot5  (WD Red Pro 8TB)           health=70/100  status=WARNING
    🟡 udma_crc_error_count=1 ≥ WARN (5)
    📈 udma_crc_error_count: 1→100 forecast in 142d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Slot 5 is trending. Not critical yet — but I know about it 142 days before it becomes a problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the forecast works
&lt;/h2&gt;

&lt;p&gt;The core is a simple linear regression over &lt;code&gt;(days_elapsed, attribute_value)&lt;/code&gt; pairs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;linear_forecast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;xs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;ys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;mean_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="n"&gt;mean_y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ys&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
    &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean_y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ys&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;den&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;den&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slope&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;den&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;x_target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean_y&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;slope&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;mean_x&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;slope&lt;/span&gt;
    &lt;span class="n"&gt;days_until&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_target&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days_until&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;days_until&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing fancy. But with 6h collection frequency and 30 days of history, it gives you 5+ data points per attribute — enough to catch real trends while ignoring one-off noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thresholds
&lt;/h2&gt;

&lt;p&gt;Argus uses Backblaze-calibrated thresholds rather than vendor defaults. A few notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;seek_error_rate&lt;/code&gt; is excluded&lt;/strong&gt; — Seagate packs seek totals in the upper 32 bits, making the raw value meaningless for cross-vendor comparison. Backblaze doesn't use it either.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;udma_crc_error_count&lt;/code&gt; warns at 5, not 1&lt;/strong&gt; — a single historical CRC on a multi-year disk is physiological. Growth is what matters, and the forecast captures it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temperature anomaly&lt;/strong&gt; uses z-score over 10+ samples, not a fixed threshold — so a disk that normally runs at 28°C will alert at 36°C, while one that normally runs at 38°C won't.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Config for a TerraMaster D5-300
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /opt/argus/config/argus.conf
&lt;/span&gt;
&lt;span class="nn"&gt;[argus]&lt;/span&gt;
&lt;span class="py"&gt;history_file&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/var/lib/argus/argus-history.json&lt;/span&gt;
&lt;span class="py"&gt;retention_days&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;180&lt;/span&gt;

&lt;span class="nn"&gt;[ntfy]&lt;/span&gt;
&lt;span class="py"&gt;url&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;http://your-ntfy:8080&lt;/span&gt;
&lt;span class="py"&gt;topic&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;argus-disk&lt;/span&gt;

&lt;span class="nn"&gt;[disk:ssd-system]&lt;/span&gt;
&lt;span class="py"&gt;device&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/dev/sda&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;sat&lt;/span&gt;
&lt;span class="py"&gt;class&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;ssd&lt;/span&gt;

&lt;span class="nn"&gt;[disk:das-slot1]&lt;/span&gt;
&lt;span class="py"&gt;device&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/dev/sdb&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jmb39x,0&lt;/span&gt;
&lt;span class="py"&gt;class&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;hdd&lt;/span&gt;

&lt;span class="nn"&gt;[disk:das-slot2]&lt;/span&gt;
&lt;span class="py"&gt;device&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/dev/sdb&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jmb39x,1&lt;/span&gt;
&lt;span class="py"&gt;class&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;hdd&lt;/span&gt;

&lt;span class="nn"&gt;[disk:das-slot3]&lt;/span&gt;
&lt;span class="py"&gt;device&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/dev/sdb&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jmb39x,2&lt;/span&gt;
&lt;span class="py"&gt;class&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;hdd&lt;/span&gt;

&lt;span class="nn"&gt;[disk:das-slot4]&lt;/span&gt;
&lt;span class="py"&gt;device&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/dev/sdb&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jmb39x,3&lt;/span&gt;
&lt;span class="py"&gt;class&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;hdd&lt;/span&gt;

&lt;span class="nn"&gt;[disk:das-slot5]&lt;/span&gt;
&lt;span class="py"&gt;device&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/dev/sdb&lt;/span&gt;
&lt;span class="py"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jmb39x,4&lt;/span&gt;
&lt;span class="py"&gt;class&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;hdd&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/pdegidio/argus-disk.git
&lt;span class="nb"&gt;cd &lt;/span&gt;argus-disk
bash install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer auto-discovers your disks via &lt;code&gt;smartctl --scan&lt;/code&gt;, walks you through the config, and sets up cron jobs for collection and alerting.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/pdegidio/argus-disk" rel="noopener noreferrer"&gt;github.com/pdegidio/argus-disk&lt;/a&gt; — MIT license.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Has anyone else found JMicron pass-through working on other enclosure brands? Curious how far jmb39x,N generalises beyond TerraMaster.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>homelab</category>
      <category>selfhosted</category>
      <category>linux</category>
      <category>python</category>
    </item>
  </channel>
</rss>
