TL;DR — I built a full-stack intrusion detection platform: a hybrid ML model (Focal Loss classifier + autoencoder for zero-days), a 4-service Go/Python backend, and an 11-page React dashboard. R2L recall doubled (14% → 29%), zero-day detection works via reconstruction error, and the whole pipeline runs in <2ms per prediction.
The Problem No One Talks About
Every network security tutorial shows you how to train a classifier on NSL-KDD and call it done. But classifiers have a fatal blind spot: they can only detect attacks they've trained on.
The NSL-KDD test set alone contains 17 attack subtypes that don't appear in training. In production, this ratio is infinitely worse — every new CVE, every new botnet, every novel C2 protocol is invisible to a classifier until you retrain.
I wanted to build something that catches both: known attacks with high precision and novel threats it's never seen before.
Architecture at a Glance
┌─────────────────────────┐
│ React Dashboard :5173 │ 11 pages, ECharts, AG Grid, Firebase Auth
│ Apollo Client (GQL+WS)│
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ GraphQL Gateway :4000 │ gqlgen, Firebase Admin SDK, SQLite
│ Auth · Persistence │
└───────────┬─────────────┘
│ HTTP proxy
┌───────────▼─────────────┐
│ Go API Backend :8080 │ REST, WebSocket broadcast, Simulation
│ Packet capture control │
└───────────┬─────────────┘
│ HTTP /api/predict
┌───────────▼─────────────┐
│ FastAPI ML Server :8000│ PyTorch → ONNX inference, <2ms/sample
└─────────────────────────┘
Why 4 services? The ML server needs high memory and bursty CPU. The Go API needs low memory and high concurrency. Scaling them as one binary wastes money. Each service deploys and scales independently.
The Hybrid ML Model
Problem 1: Rare attacks are invisible to standard loss functions
NSL-KDD class distribution:
| Class | Training % | What it is |
|---|---|---|
| Normal | 53% | Legitimate traffic |
| DoS | 36% | Denial of service |
| Probe | 9.4% | Port scanning, reconnaissance |
| R2L | 0.8% | Remote-to-Local exploits |
| U2R | 0.04% | User-to-Root privilege escalation |
Standard CrossEntropy loss treats all samples equally — so the model essentially ignores R2L and U2R. Why learn to detect 0.04% of the data when you can boost accuracy by getting the 53% right?
Fix: Focal Loss.
FL(p_t) = -α_t (1 - p_t)^γ log(p_t)
Focal Loss down-weights easy examples (the Normal traffic the model already handles) and forces it to focus on hard, rare samples. The γ parameter controls how aggressively.
Result:
| Metric | CrossEntropy (V1) | Focal Loss (V2) | Change |
|---|---|---|---|
| Probe Recall | 68% | 88% | +20% |
| R2L Recall | 14% | 29% | +15% (2x) |
| Macro F1 | 0.558 | 0.589 | +5.5% |
Problem 2: Zero-days are invisible to any classifier
A classifier is a lookup table — "I've seen this pattern before, it's DoS." For the 17 novel attack subtypes in the test set (and the infinite unknown attacks in the wild), we need something fundamentally different.
Fix: Autoencoder trained only on Normal traffic.
The autoencoder learns to compress and reconstruct "what normal looks like." At inference:
- Low reconstruction error → looks normal → pass
- High reconstruction error → looks like nothing I've ever seen → anomaly flag
This is zero-day detection by definition: it catches anything that deviates from normal, regardless of whether it matches any known signature.
The 3-Vote Consensus
Two detectors means disagreements. Policy:
| Classifier | Autoencoder | Action |
|---|---|---|
| Attack | High error | High confidence — auto-block above threshold |
| Normal | High error | Possible zero-day — escalate to operator |
| Attack | Low error | Known pattern — treat at face value |
| Normal | Low error | Green — pass |
The second row is the gold: a connection that looks normal per-feature but whose overall shape is anomalous. That's where zero-days hide.
The Dashboard — 11 Pages
Not just a prediction endpoint. Operators need context:
| Page | Why it exists |
|---|---|
| World Map | Animated attack routes with GeoIP — "where is this coming from?" |
| Traffic Feed | AG Grid, bulk select, CSV export — "show me everything" |
| Incidents | AI threat summary + timeline by class — "what's happening now?" |
| Analytics | Time-series, attack distribution — "what's the trend?" |
| Network Graph | Force-directed IP topology — "who's talking to whom?" |
| Model Compare | Radar chart, confusion matrix — "how good are we?" |
| SOAR | Auto-block on confidence thresholds — "respond automatically" |
| Settings | Profile, theme, webhooks, Slack — "configure everything" |
| System Status | Service health, latency sparklines — "is the platform healthy?" |
Firebase Auth handles Google Sign-In + Email/Password. First user = admin in SQLite; subsequent = viewers. RBAC enforced at the gateway, not just the UI.
Performance
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Normal | 91.0% | 88.9% | 90.0% |
| DoS | 95.8% | 85.6% | 90.4% |
| Probe | 60.4% | 88.3% | 71.7% |
| R2L | 53.9% | 29.4% | 38.0% |
| U2R | 2.2% | 41.8% | 4.2% |
Overall: 80.0% accuracy · 0.589 Macro F1
R2L/U2R are still the weakest — genuinely hard with <1% of training data. The autoencoder compensates by catching them as distribution anomalies even when the classifier misses.
What I'd Do Differently
Add a χ² frequency-distribution detector — the same math that cracks a Caesar cipher (compare observed letter frequencies to English) applied to network feature distributions. Catches slow horizontal scans that per-packet models miss.
Per-host behavioral baselines — a global Normal model doesn't notice "this IP was quiet for 6 months and just started scanning port 445." A per-host baseline does.
Adversarial training — FGSM/PGD perturbations during training to harden against evasion. If someone discovers your model, they'll craft packets to slip past it.
Stack
Go · Python · PyTorch · ONNX · FastAPI · React 19 · Vite · Tailwind CSS · ECharts · AG Grid · Apollo Client · GraphQL (gqlgen) · Firebase Auth · SQLite · scapy · GeoIP2
If you're building something similar or want to discuss the architecture — connect with me on LinkedIn.
Top comments (2)
Wow that's really cool! Did you build it as a side project or as a part of a bigger system?
Thanks! Personal project — wanted to go end-to-end on ML + network security outside of work.