Vaibhav binwal

Posted on Jun 5

I Built a DDoS Mitigation Engine That Drops Packets Before the Kernel Sees Them

#ebpf #linux #security #cloudflare

I'm a first-year undergraduate. Last month I built a DDoS mitigation engine that operates at a layer most developers never touch — inside the NIC driver, before the Linux kernel has done anything at all with a packet.

No sk_buff. No netfilter. No routing lookup. If the packet is malicious, it gets dropped in the driver's receive loop and the kernel never finds out it existed.

This is eBPF/XDP — eXpress Data Path — and it's one of the most interesting pieces of Linux infrastructure. Here's what I built, why the architecture works, and the things that went badly wrong along the way.

TL;DR
Sentinel-X drops packets at the XDP hook before the Linux kernel allocates a single byte of memory for them. p99 verdict latency: 0.066 ms vs iptables' 2–15 ms — up to 225× faster at the tail. 96.51% drop accuracy across 45,327,065 packets. ML feedback loop auto-updates blacklists in ~1.2 seconds from attack onset.
Code: github.com/Vaibhav805/sentinel-x

The numbers first

All benchmarks on an IdeaPad Slim 3 (AMD Ryzen, 4 cores) using veth pairs and kernel network namespaces — no physical 10Gbps NIC, just software-emulated interfaces.

Metric	Value
Total packets processed	45,327,065
Drop accuracy	96.51%
False positive rate	0.31%
XDP verdict latency p50	0.012 ms
XDP verdict latency p99	0.066 ms
iptables under same flood	2–15 ms
Kernel memory footprint	5.2 MB
ML response time	~1.2 s

The latency comparison is what matters. iptables at p99 is 30–225× slower than Sentinel-X under the same flood. That gap isn't a tuning difference — it's architectural. iptables runs after the kernel has already paid the full cost for every packet. XDP runs before.

Why every existing tool breaks under a real flood

When a packet arrives at your NIC under normal Linux networking:

NIC receives packet
      │
      ▼
Driver NAPI poll loop
      │
      ▼  ← sk_buff allocated HERE (~256 bytes per packet)
GRO coalescing
      │
      ▼
Netfilter / iptables  ← evaluated unconditionally
      │
      ▼
Routing table lookup
      │
      ▼
Your application

At 10Gbps line-rate, a flood of 64-byte packets hits ~14.88 million packets per second. The kernel allocates a fresh sk_buff for every single one — including the 13 million that are malicious. Netfilter evaluates its ruleset on every single one. The routing table is consulted on every single one.

This isn't a bug in iptables. It's a consequence of where in the stack it sits. By the time iptables sees a packet, the kernel has already paid the full cost. You're not filtering the flood — you're processing it and discarding the result.

The XDP hook changes where the decision happens:

NIC receives packet
      │
      ▼
Driver NAPI poll loop
      │
      ▼  ← XDP hook fires HERE — Sentinel-X runs here
      │       XDP_DROP? → buffer recycled. kernel allocates nothing.
      │       XDP_PASS? → continue below
      ▼
sk_buff allocated  ← only happens for legitimate traffic

The XDP hook runs inside the driver's receive loop, operating directly on the DMA buffer the NIC wrote into. No copy. No allocation. If the verdict is XDP_DROP, the buffer is recycled and the packet disappears. The Linux networking stack never knew it arrived.

A packet dropped before sk_buff allocation costs the kernel nothing. This is the axiom the entire project is built around.

Architecture

Sentinel-X has two completely separate layers that never block each other.

Layer 1 — the data plane (`sentinel_x.c`)

A C program that runs in kernel context, JIT-compiled by the kernel's BPF infrastructure. No userspace memory access. No system calls. No dynamic allocation.

Every packet runs this pipeline:

Packet arrives
      │
      ▼
STEP 1: Header parse            (~3–5 ns)
        Malformed packet → XDP_PASS
      │
      ▼
STEP 2: LPM trie blacklist      (~15–40 ns)
        Match → XDP_DROP
      │
      ▼
STEP 3: Per-IP rate limit       (~10–20 ns)
        Exceeds threshold → XDP_DROP
      │
      ▼
STEP 4: Per-CPU stats update    (~2–5 ns)
      │
      ▼
STEP 5: Ring buffer push        (~5–10 ns)
        Non-blocking async
      │
      ▼
      XDP_PASS → Linux network stack

Total cost for a dropped packet: 30–70 nanoseconds. The kernel never allocates memory. Netfilter never fires.

Layer 2 — the control plane (Python)

Two processes run in userspace, communicating with the kernel only through BPF maps:

flux.py — compiles and attaches sentinel_x.c to the interface, initializes all maps, runs the live stats dashboard.

bridge.py — polls the BPF ring buffer for packet events, extracts features, runs ML inference, writes new CIDR blacklist entries into the kernel map when an attack is detected.

The kernel program never waits for the ML engine. The ML engine never slows the fast path. This decoupling is the entire reason p99 stays at 0.066ms under full flood load.

The BPF maps

BPF maps are the nervous system of the system — kernel memory accessible from both the eBPF program and userspace.

Map	Type	Purpose	Size
`blacklist_map`	`LPM_TRIE`	CIDR-aware IP blacklist	~4.0 MB
`ip_counts`	`HASH`	Per-IP rate limiting	~0.8 MB
`global_stats`	`PERCPU_ARRAY`	Aggregate counters	~0.2 MB
`drop_stats`	`PERCPU_ARRAY`	Per-reason drop counters	~0.2 MB
`ring_buf`	`RINGBUF`	Async event stream	variable
		Total	~5.2 MB

5.2 MB total. A single Nginx worker uses 8–20 MB at idle.

Why PERCPU_ARRAY instead of shared atomics?

Under flood conditions, 4 cores each receiving 3.5M packets per second all want to increment the same counter. With lock xadd, every increment bounces the cache line between cores. At 14M PPS this destroys performance.

Per-CPU arrays give each core its own slot. Zero coordination. Zero cache bouncing:

CPU 0: global_stats[0].packets = 11,331,766
CPU 1: global_stats[1].packets = 11,331,766
CPU 2: global_stats[2].packets = 11,331,767
CPU 3: global_stats[3].packets = 11,331,766
                                 ──────────
                     Total:      45,327,065

Why LPM_TRIE instead of a hash map?

Botnet IPs often share a subnet — 192.168.100.0/24. An LPM trie handles CIDR ranges natively. One insertion blocks 256 IPs. A flat hash map requires 256 separate insertions and can't express subnets at all.

The ML feedback loop

The kernel data plane is fast but static — it only applies rules that already exist. The ML loop makes the system adaptive.

Timing hierarchy

XDP verdict:                   ~50–100 ns
Ring buffer production:        ~10 ns
Ring buffer consumption:       ~1–10 ms
ML inference window:           ~100–500 ms
Blacklist update round-trip:   ~1–5 ms
Attack onset → blacklist:      ~1.2 s

The fast path is never gated on any of this. XDP decides from maps right now. The ML engine updates them on its own clock. No lock between them.

Feature extraction

Every 5 seconds, bridge.py aggregates ring buffer events into a 6-dimensional feature vector:

pkt_rate — packets per second
unique_src_ips — distinct source IPs in the window
proto_entropy — Shannon entropy of protocol distribution (uniform = attack)
port_entropy — entropy of destination port distribution
byte_rate — bytes per second
syn_ratio — fraction of TCP packets with SYN set

Why two models, not one

XGBoost is supervised — trained on labeled traffic windows. Excellent at known attack archetypes: SYN floods, UDP amplification, ICMP floods. Under 1ms inference for a 6-feature vector.

Isolation Forest is unsupervised — learns what normal traffic looks like and flags anomalies without labeled examples. A safety net for novel attack patterns XGBoost has never seen.

The conjunction rule: blacklist updates only fire when both models agree. XGBoost alone misses zero-day vectors. Isolation Forest alone generates too many false positives during flash crowds. Together: high precision, high recall.

The hardest bugs I hit

Bug 1 — the orphaned XDP program that killed my network

Nobody warns you about this upfront.

If your loader process crashes with kill -9 while an XDP program is attached, the program stays attached. An XDP program returning XDP_DROP for every packet will black-hole 100% of traffic on that interface until someone manually detaches it.

This happened to me. My machine lost all network connectivity mid-session. No ping. No SSH. Nothing.

The fix:

# Emergency detach
sudo ip link set dev eth0 xdp off

# Verify
sudo bpftool net show dev eth0
# Should show: xdp: <none>

I now handle SIGINT and SIGTERM in flux.py to always call BPF.remove_xdp(dev) before exit. Graceful shutdown is a safety feature, not a nicety.

Bug 2 — the BPF verifier rejecting valid-looking code

My first LPM trie lookup:

struct bpf_lpm_trie_key *val = bpf_map_lookup_elem(&blacklist_map, &key);
if (val->data[0]) {  // verifier rejects this
    return XDP_DROP;
}

Verifier error: R0 invalid mem access 'map_value_or_null'

bpf_map_lookup_elem can return NULL and the verifier tracks this through every code path. You must null-check unconditionally before any dereference:

struct bpf_lpm_trie_key *val = bpf_map_lookup_elem(&blacklist_map, &key);
if (val) {  // required — no exceptions
    return XDP_DROP;
}

Once I understood the pattern — every map lookup returns a nullable pointer, every dereference must be guarded — the verifier stopped being my enemy and started catching my bugs.

Bug 3 — stack overflow in kernel context

BPF programs have a hard 512-byte stack limit. No exceptions. I hit this building a packet struct:

struct pkt_info {
    __u32 src_ip, dst_ip;
    __u16 src_port, dst_port;
    __u8  proto, flags;
    __u64 timestamp;
    char  src_str[64];    // this killed me
    char  dst_str[64];
};

Fix: stop storing formatted strings in kernel context. Push raw integers to the ring buffer, format in userspace.

Bug 4 — the false positive cascade

Early ML logic had OR instead of AND for the conjunction rule. A legitimate traffic spike triggered XGBoost's volumetric class without triggering Isolation Forest. Result: 47 real IPs auto-blacklisted including actual users.

The fix was a --dry-run flag — run inference and log what would happen without touching any maps. I now dry-run for at least an hour after any model change before enabling enforcement.

Why this is actually safe — the BPF verifier contract

Before any eBPF program runs, the verifier proves:

Every memory access is within bounds
Every map lookup is null-checked before use
No unbounded loops — the program terminates
Stack usage stays under 512 bytes
No calls to arbitrary kernel functions

Sentinel-X's kernel component provably cannot crash the kernel. Not "unlikely to crash" — provably cannot. If the program loads, it is safe to run.

Performance comparison

Solution	Architecture	p99 latency
Sentinel-X	XDP pre-stack	0.066 ms
tc eBPF	TC ingress hook	~0.5–2 ms
iptables / nftables	Netfilter hook	~2–15 ms
Snort / Suricata	Userspace queue	~5–50 ms

Even eBPF at the tc layer is slower than XDP because it runs after sk_buff allocation. Stack placement is everything.

What I actually learned

The BPF verifier is not your enemy. Every rejection pointed at a real bug. Read the error, fix the code.

Decouple fast path from slow path. ML inference takes hundreds of milliseconds. XDP decides in nanoseconds. They coexist only because they never share a critical section.

Graceful shutdown is a safety feature. An orphaned XDP program drops all traffic. Handle your signals.

Per-CPU structures are the correct default at high PPS. Shared atomics cause cache contention that destroys performance. Per-CPU arrays eliminate the problem entirely.

Dry-run before live enforcement, always. The blast radius of a misconfigured ML model is very real.

What's next

Prometheus + Grafana — real-time attack dashboards from existing per-CPU stats
Online learning — replace static XGBoost with River ML for incremental model updates
BGP Blackhole integration — announce /32 blackhole routes upstream via GoBGP when flood volume crosses a threshold
eBPF CO-RE — migrate from BCC to libbpf + BTF for portable pre-compiled binaries on any kernel 5.8+

If you've done production eBPF work — especially around XDP attachment modes, CO-RE portability, or AF_XDP zero-copy — I'm genuinely curious what failure modes look like at scale beyond a veth testbed.

GitHub: github.com/Vaibhav805/sentinel-x

The README has complete architecture diagrams, full CLI reference, and the operational runbook including the emergency detach procedure.

DEV Community

I Built a DDoS Mitigation Engine That Drops Packets Before the Kernel Sees Them

The numbers first

Why every existing tool breaks under a real flood

Architecture

Layer 1 — the data plane (`sentinel_x.c`)

Layer 2 — the control plane (Python)

The BPF maps

The ML feedback loop

Timing hierarchy

Feature extraction

Why two models, not one

The hardest bugs I hit

Bug 1 — the orphaned XDP program that killed my network

Bug 2 — the BPF verifier rejecting valid-looking code

Bug 3 — stack overflow in kernel context

Bug 4 — the false positive cascade

Why this is actually safe — the BPF verifier contract

Performance comparison

What I actually learned

What's next

Top comments (0)

The numbers first

Why every existing tool breaks under a real flood

Architecture

Layer 1 — the data plane (sentinel_x.c)

Layer 2 — the control plane (Python)

The BPF maps

The ML feedback loop

Timing hierarchy

Feature extraction

Why two models, not one

The hardest bugs I hit

Bug 1 — the orphaned XDP program that killed my network

Bug 2 — the BPF verifier rejecting valid-looking code

Bug 3 — stack overflow in kernel context

Bug 4 — the false positive cascade

Why this is actually safe — the BPF verifier contract

Performance comparison

What I actually learned

What's next

Layer 1 — the data plane (`sentinel_x.c`)