DEV Community

Cover image for How to get individual SMART data from a TerraMaster DAS (and build a failure forecaster around it)
Paolo D'Egidio
Paolo D'Egidio

Posted on

How to get individual SMART data from a TerraMaster DAS (and build a failure forecaster around it)

If you have a TerraMaster D5-300 — or any DAS with a JMicron JMB576 bridge chip — you've probably hit this wall:

$ smartctl -a /dev/sdb
...
SMART overall-health self-assessment test result: PASSED
...
Enter fullscreen mode Exit fullscreen mode

One result. For the whole enclosure. Not the 5 individual WD Red Pros inside it.

Most SMART monitoring guides stop here and tell you to use the vendor app. I went a different way.

The fix: JMicron pass-through

The JMB576 chip supports SMART pass-through via a specific smartctl flag:

smartctl -a -d jmb39x,0 /dev/sdb   # slot 1
smartctl -a -d jmb39x,1 /dev/sdb   # slot 2
smartctl -a -d jmb39x,2 /dev/sdb   # slot 3
smartctl -a -d jmb39x,3 /dev/sdb   # slot 4
smartctl -a -d jmb39x,4 /dev/sdb   # slot 5
Enter fullscreen mode Exit fullscreen mode

The N in jmb39x,N maps directly to the physical slot. Run this and you get full per-disk SMART output — temperature, reallocated sectors, CRC errors, pending sectors, everything.

Tested on: TerraMaster D5-300, Debian 12.5, smartctl 7.3.

Why this matters

With per-disk SMART data, you can actually monitor disk health instead of just hoping the enclosure is OK. But raw SMART numbers aren't enough — what you really want is trends.

A single udma_crc_error_count = 1 on a 4-year-old disk is probably fine. udma_crc_error_count going from 1 to 3 to 7 to 12 over 60 days is not fine — it's a cable or backplane issue that will get worse.

Building a forecaster

I built Argus around this discovery. It collects SMART attributes every 6 hours, builds a 180-day rolling history, and runs a linear regression over the last 30 days to forecast when each attribute will hit its critical threshold.

The output looks like this:

👁️  Argus SMART Analysis — 2026-04-17T09:00:00+00:00
   Samples: 28 (forecast window: 30d)
   Overall: WARNING

✅ ssd-system (SanDisk Ultra II 960GB)   health=95/100  status=OK
✅ das-slot1  (WD Red Pro 8TB)           health=100/100 status=OK
✅ das-slot2  (WD Red Pro 8TB)           health=100/100 status=OK
✅ das-slot3  (WD Red Pro 8TB)           health=100/100 status=OK
✅ das-slot4  (WD Red Pro 8TB)           health=100/100 status=OK
🟡 das-slot5  (WD Red Pro 8TB)           health=70/100  status=WARNING
    🟡 udma_crc_error_count=1 ≥ WARN (5)
    📈 udma_crc_error_count: 1→100 forecast in 142d
Enter fullscreen mode Exit fullscreen mode

Slot 5 is trending. Not critical yet — but I know about it 142 days before it becomes a problem.

How the forecast works

The core is a simple linear regression over (days_elapsed, attribute_value) pairs:

def linear_forecast(points: list, target: float) -> float | None:
    xs = [p[0] for p in points]
    ys = [p[1] for p in points]
    n = len(points)
    mean_x = sum(xs) / n
    mean_y = sum(ys) / n
    num = sum((x - mean_x) * (y - mean_y) for x, y in zip(xs, ys))
    den = sum((x - mean_x) ** 2 for x in xs)
    if den == 0 or (slope := num / den) <= 0:
        return None
    x_target = (target - (mean_y - slope * mean_x)) / slope
    days_until = x_target - max(xs)
    return round(days_until, 1) if 0 < days_until <= 180 else None
Enter fullscreen mode Exit fullscreen mode

Nothing fancy. But with 6h collection frequency and 30 days of history, it gives you 5+ data points per attribute — enough to catch real trends while ignoring one-off noise.

Thresholds

Argus uses Backblaze-calibrated thresholds rather than vendor defaults. A few notes:

  • seek_error_rate is excluded — Seagate packs seek totals in the upper 32 bits, making the raw value meaningless for cross-vendor comparison. Backblaze doesn't use it either.
  • udma_crc_error_count warns at 5, not 1 — a single historical CRC on a multi-year disk is physiological. Growth is what matters, and the forecast captures it.
  • Temperature anomaly uses z-score over 10+ samples, not a fixed threshold — so a disk that normally runs at 28°C will alert at 36°C, while one that normally runs at 38°C won't.

Config for a TerraMaster D5-300

# /opt/argus/config/argus.conf

[argus]
history_file = /var/lib/argus/argus-history.json
retention_days = 180

[ntfy]
url   = http://your-ntfy:8080
topic = argus-disk

[disk:ssd-system]
device = /dev/sda
type   = sat
class  = ssd

[disk:das-slot1]
device = /dev/sdb
type   = jmb39x,0
class  = hdd

[disk:das-slot2]
device = /dev/sdb
type   = jmb39x,1
class  = hdd

[disk:das-slot3]
device = /dev/sdb
type   = jmb39x,2
class  = hdd

[disk:das-slot4]
device = /dev/sdb
type   = jmb39x,3
class  = hdd

[disk:das-slot5]
device = /dev/sdb
type   = jmb39x,4
class  = hdd
Enter fullscreen mode Exit fullscreen mode

Getting started

git clone https://github.com/pdegidio/argus-disk.git
cd argus-disk
bash install.sh
Enter fullscreen mode Exit fullscreen mode

The installer auto-discovers your disks via smartctl --scan, walks you through the config, and sets up cron jobs for collection and alerting.

Source: github.com/pdegidio/argus-disk — MIT license.


Has anyone else found JMicron pass-through working on other enclosure brands? Curious how far jmb39x,N generalises beyond TerraMaster.

Top comments (0)