Building a Real-Time Network Anomaly Detector with PyTorch and Go

#pytorch #go #machinelearning #microservices

TL;DR — I built a full-stack intrusion detection platform: a hybrid ML model (Focal Loss classifier + autoencoder for zero-days), a 4-service Go/Python backend, and an 11-page React dashboard. R2L recall doubled (14% → 29%), zero-day detection works via reconstruction error, and the whole pipeline runs in <2ms per prediction.

The Problem No One Talks About

Every network security tutorial shows you how to train a classifier on NSL-KDD and call it done. But classifiers have a fatal blind spot: they can only detect attacks they've trained on.

The NSL-KDD test set alone contains 17 attack subtypes that don't appear in training. In production, this ratio is infinitely worse — every new CVE, every new botnet, every novel C2 protocol is invisible to a classifier until you retrain.

I wanted to build something that catches both: known attacks with high precision and novel threats it's never seen before.

Architecture at a Glance

┌─────────────────────────┐
│   React Dashboard :5173 │  11 pages, ECharts, AG Grid, Firebase Auth
│   Apollo Client (GQL+WS)│
└───────────┬─────────────┘
            │
┌───────────▼─────────────┐
│  GraphQL Gateway :4000  │  gqlgen, Firebase Admin SDK, SQLite
│  Auth · Persistence     │
└───────────┬─────────────┘
            │ HTTP proxy
┌───────────▼─────────────┐
│  Go API Backend :8080   │  REST, WebSocket broadcast, Simulation
│  Packet capture control │
└───────────┬─────────────┘
            │ HTTP /api/predict
┌───────────▼─────────────┐
│  FastAPI ML Server :8000│  PyTorch → ONNX inference, <2ms/sample
└─────────────────────────┘

Why 4 services? The ML server needs high memory and bursty CPU. The Go API needs low memory and high concurrency. Scaling them as one binary wastes money. Each service deploys and scales independently.

The Hybrid ML Model

Problem 1: Rare attacks are invisible to standard loss functions

NSL-KDD class distribution:

Class	Training %	What it is
Normal	53%	Legitimate traffic
DoS	36%	Denial of service
Probe	9.4%	Port scanning, reconnaissance
R2L	0.8%	Remote-to-Local exploits
U2R	0.04%	User-to-Root privilege escalation

Standard CrossEntropy loss treats all samples equally — so the model essentially ignores R2L and U2R. Why learn to detect 0.04% of the data when you can boost accuracy by getting the 53% right?

Fix: Focal Loss.

FL(p_t) = -α_t (1 - p_t)^γ log(p_t)

Focal Loss down-weights easy examples (the Normal traffic the model already handles) and forces it to focus on hard, rare samples. The γ parameter controls how aggressively.

Result:

Metric	CrossEntropy (V1)	Focal Loss (V2)	Change
Probe Recall	68%	88%	+20%
R2L Recall	14%	29%	+15% (2x)
Macro F1	0.558	0.589	+5.5%

Problem 2: Zero-days are invisible to any classifier

A classifier is a lookup table — "I've seen this pattern before, it's DoS." For the 17 novel attack subtypes in the test set (and the infinite unknown attacks in the wild), we need something fundamentally different.

Fix: Autoencoder trained only on Normal traffic.

The autoencoder learns to compress and reconstruct "what normal looks like." At inference:

Low reconstruction error → looks normal → pass
High reconstruction error → looks like nothing I've ever seen → anomaly flag

This is zero-day detection by definition: it catches anything that deviates from normal, regardless of whether it matches any known signature.

The 3-Vote Consensus

Two detectors means disagreements. Policy:

Classifier	Autoencoder	Action
Attack	High error	High confidence — auto-block above threshold
Normal	High error	Possible zero-day — escalate to operator
Attack	Low error	Known pattern — treat at face value
Normal	Low error	Green — pass

The second row is the gold: a connection that looks normal per-feature but whose overall shape is anomalous. That's where zero-days hide.

The Dashboard — 11 Pages

Not just a prediction endpoint. Operators need context:

Page	Why it exists
World Map	Animated attack routes with GeoIP — "where is this coming from?"
Traffic Feed	AG Grid, bulk select, CSV export — "show me everything"
Incidents	AI threat summary + timeline by class — "what's happening now?"
Analytics	Time-series, attack distribution — "what's the trend?"
Network Graph	Force-directed IP topology — "who's talking to whom?"
Model Compare	Radar chart, confusion matrix — "how good are we?"
SOAR	Auto-block on confidence thresholds — "respond automatically"
Settings	Profile, theme, webhooks, Slack — "configure everything"
System Status	Service health, latency sparklines — "is the platform healthy?"

Firebase Auth handles Google Sign-In + Email/Password. First user = admin in SQLite; subsequent = viewers. RBAC enforced at the gateway, not just the UI.

Performance

Class	Precision	Recall	F1
Normal	91.0%	88.9%	90.0%
DoS	95.8%	85.6%	90.4%
Probe	60.4%	88.3%	71.7%
R2L	53.9%	29.4%	38.0%
U2R	2.2%	41.8%	4.2%

Overall: 80.0% accuracy · 0.589 Macro F1

R2L/U2R are still the weakest — genuinely hard with <1% of training data. The autoencoder compensates by catching them as distribution anomalies even when the classifier misses.

What I'd Do Differently

Add a χ² frequency-distribution detector — the same math that cracks a Caesar cipher (compare observed letter frequencies to English) applied to network feature distributions. Catches slow horizontal scans that per-packet models miss.
Per-host behavioral baselines — a global Normal model doesn't notice "this IP was quiet for 6 months and just started scanning port 445." A per-host baseline does.
Adversarial training — FGSM/PGD perturbations during training to harden against evasion. If someone discovers your model, they'll craft packets to slip past it.

Stack

Go · Python · PyTorch · ONNX · FastAPI · React 19 · Vite · Tailwind CSS · ECharts · AG Grid · Apollo Client · GraphQL (gqlgen) · Firebase Auth · SQLite · scapy · GeoIP2