DEV Community

Mohamed Zrouga
Mohamed Zrouga

Posted on

I'm not an ML engineer. I built one anyway.

Not because I wanted to — but because every tool I tried on ARM edge devices either needed the cloud, needed a GPU, or needed more RAM than the service it was supposed to be watching.

So this post isn't really about Cerberus. It's about the problem it forced me to actually understand: what does anomaly detection require on constrained hardware, and how do you get there without black-box ML?

Here's what I learned.

The observability stack was heavier than the workload

When you're deploying on cloud VMs, the weight of your tooling is invisible. You have RAM to spare. You have fast networks. You have Prometheus scraping endpoints on a LAN that never goes offline.

Drop the same assumptions onto an ARM gateway at a remote industrial site and things break differently. The telemetry pipeline competes with the workload for CPU cycles. The collector needs connectivity that doesn't exist. The ML inference endpoint is somewhere in a cloud region the device can't reach.

The problem isn't the tools — they were built for a different environment. The problem is treating cloud-native observability as a default rather than a choice.

Once I asked "what does edge observability actually need?" the answer was much smaller than I expected:

Did traffic behavior change?
Is something probing unusual ports?
Are protocol patterns different from yesterday?
Is there unexplained traffic acceleration?
Which specific device changed?
Enter fullscreen mode Exit fullscreen mode

That's not a distributed tracing problem. It's a behavioral signal problem.

Why eBPF is the right layer for this

The kernel already sees everything. Every packet, every connection, every flag — it all passes through the network stack before any userspace process touches it.

eBPF lets you attach small programs directly to that stack using TC (Traffic Control) or XDP hooks. Instead of running tcpdump through a pipe, or copying full payloads into userspace for inspection, you write a kernel-side filter that extracts only the metadata you care about and hands it to you via a ring buffer.

For Cerberus, that's roughly 208 bytes per event:

struct network_event {
    __u8  event_type;       // ARP / TCP / UDP / DNS / TLS / HTTP / ICMP
    __u32 src_ip;
    __u32 dst_ip;
    __u16 src_port;
    __u16 dst_port;
    __u8  tcp_flags;
    __u8  l7_payload[128];  // first 128 bytes for L7 inspection
    // ...
};
Enter fullscreen mode Exit fullscreen mode

The kernel filters. The ring buffer delivers. Userspace gets a clean event stream at near-zero overhead — no full payload copies, no extra processes, no agents fighting the workload for CPU.

On ARM systems, this difference is measurable.

ML-Lite architecture flow

What "ML-Lite" actually means

Before I go further: I want to be clear that I'm not an ML engineer. What I built is better described as applied statistics with some online learning layered on top. I'm calling it ML-Lite because that's what it is, not because it sounds impressive.

The instinct when building anomaly detection is to reach for a neural network or a heavy ML runtime. On constrained hardware that's a dead end — both because of resource cost and because the explainability disappears. An operator at 2am staring at an alert doesn't want a confidence score. They want to know what changed.

So the system works in three stages.

Stage 1: Aggregate into windows

Every 30 seconds, the event stream is compressed into a feature vector:

[packet_rate, dns_rate, tls_rate, syn_rate, entropy, unusual_ports]
Enter fullscreen mode Exit fullscreen mode

This is the "network behavior as numbers" step. Each window becomes a compact snapshot of what the device was doing.

Stage 2: Build a baseline

As windows accumulate, the system learns what normal looks like using three tools:

  • Median + MAD (Median Absolute Deviation) — robust to outliers in a way mean/stddev aren't. If one window has a traffic spike, the baseline doesn't shift.
  • EWMA (Exponentially Weighted Moving Average) — gives recent windows more weight than old ones, so the baseline adapts slowly over time.
  • Centroid distance — tracks how far the current feature vector is from the center of historical observations.

The scoring formula for each feature is:

robust_z = |x - median| / MAD
Enter fullscreen mode Exit fullscreen mode

And entropy is computed as:

H(X) = -Σ p(x) log₂ p(x)
Enter fullscreen mode Exit fullscreen mode

Where x is the distribution of destination ports. Normal traffic hits the same handful of ports repeatedly — entropy stays low. A port scan touches 22, 23, 80, 443, 445, 3389 in sequence — entropy spikes.

Stage 3: Explain the score

This is the part I cared most about. The system doesn't just surface a number — it surfaces which features drove it:

WHY?
+ High SYN rate
+ Port entropy spike
+ Traffic acceleration
Enter fullscreen mode Exit fullscreen mode

An operator can act on that immediately.

How Cerberus learns normal behavior

The evolution wasn't planned

The detection model went through several iterations, each adding a layer without replacing what came before:

v1 — Statistical detection: Median, MAD, thresholds, entropy. Worked. Noisy on IoT networks.

v2 — Adaptive learning: EWMA, rolling baselines, per-device profiles. Reduced false positives significantly once the baseline had enough history.

v3 — Isolation Forest: Unsupervised ML, tree isolation, outlier scoring. Doesn't need labeled attack data. Effective for genuinely novel patterns.

v4 — Tiny autoencoders: Architecture is 9→16→4→16→9. The bottleneck (4 dimensions) forces the model to compress normal behavior into a compact representation. Reconstruction error is the anomaly signal — if the current window can't be reconstructed well, it's unusual.

v5 (in progress) — Temporal Graph ML: Device graphs, sequence analysis, behavior prediction. The goal is to model relationships between devices over time, not just each device in isolation.

Evolution of Cerberus ML-Lite

The design constraint throughout: offline, CPU/ARM friendly, explainable, hackable. No iteration added a cloud dependency.

Why this is not AI magic

Honest limitations

Behavioral models come with real tradeoffs I haven't fully solved:

  • Baseline drift: Normal behavior changes over time. A device that starts a new cron job at 3am will generate false positives until the baseline adapts.
  • Encrypted traffic: TLS SNI is visible at the handshake, but payload content isn't. The entropy signals still work on port distributions, but deep inspection has limits.
  • Noisy IoT environments: Some IoT devices have genuinely chaotic traffic patterns. Per-device profiles help, but they need enough history to be meaningful.
  • Cold start: Until a device accumulates enough windows to build a stable baseline, scoring is unreliable.

This is not a full IDS. It's an operational visibility tool that can surface unusual behavior — with the reasoning attached.

What I'm still trying to understand

This is where I'd genuinely value input from people with more ML background than me:

  • For edge anomaly detection, is unsupervised learning (Isolation Forest, autoencoders) fundamentally the right approach, or are there better-suited algorithms for streaming, low-resource environments?
  • How do you handle baseline drift without introducing false negatives for genuine behavioral shifts?
  • For temporal modeling on embedded systems, are there efficient graph-based approaches that don't require the full overhead of a GNN framework?
  • Is there a better feature set for network behavioral anomaly detection beyond what's described here?

I'm building in a space I find genuinely interesting but I'm not formally trained in — any feedback on the ML design is more valuable to me than stars.

Thank you

Cerberus wouldn't be what it is without the people who showed up and contributed real work to it.

Huge thanks to @SvenNellerz and @alexmchughdev — your contributions made the project meaningfully better and I genuinely appreciate it.

The project also stands on cilium/ebpf, BuntDB, and golang-lru — solid libraries that made the Go + eBPF combination practical without CGO headaches.

If you're working in this space — embedded Linux, ARM infrastructure, eBPF, IoT security, or lightweight ML — the repo is at github.com/zrougamed/cerberus and I'd genuinely love to hear how you're approaching the same problems.

Top comments (0)