<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vaibhav binwal</title>
    <description>The latest articles on DEV Community by Vaibhav binwal (@vaibhav_binwal_c827618548).</description>
    <link>https://dev.to/vaibhav_binwal_c827618548</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3967653%2F46dee0ef-2c82-41d0-a3cd-e151211e753b.png</url>
      <title>DEV Community: Vaibhav binwal</title>
      <link>https://dev.to/vaibhav_binwal_c827618548</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vaibhav_binwal_c827618548"/>
    <language>en</language>
    <item>
      <title>I Built a DDoS Mitigation Engine That Drops Packets Before the Kernel Sees Them</title>
      <dc:creator>Vaibhav binwal</dc:creator>
      <pubDate>Fri, 05 Jun 2026 08:04:49 +0000</pubDate>
      <link>https://dev.to/vaibhav_binwal_c827618548/i-built-a-ddos-mitigation-engine-that-drops-packets-before-the-kernel-sees-them-575h</link>
      <guid>https://dev.to/vaibhav_binwal_c827618548/i-built-a-ddos-mitigation-engine-that-drops-packets-before-the-kernel-sees-them-575h</guid>
      <description>&lt;p&gt;I'm a first-year undergraduate. Last month I built a DDoS mitigation engine that operates at a layer most developers never touch — inside the NIC driver, before the Linux kernel has done anything at all with a packet.&lt;/p&gt;

&lt;p&gt;No &lt;code&gt;sk_buff&lt;/code&gt;. No netfilter. No routing lookup. If the packet is malicious, it gets dropped in the driver's receive loop and the kernel never finds out it existed.&lt;/p&gt;

&lt;p&gt;This is eBPF/XDP — eXpress Data Path — and it's one of the most interesting pieces of Linux infrastructure. Here's what I built, why the architecture works, and the things that went badly wrong along the way.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;br&gt;
Sentinel-X drops packets at the XDP hook before the Linux kernel allocates a single byte of memory for them. p99 verdict latency: &lt;strong&gt;0.066 ms&lt;/strong&gt; vs iptables' &lt;strong&gt;2–15 ms&lt;/strong&gt; — up to 225× faster at the tail. 96.51% drop accuracy across 45,327,065 packets. ML feedback loop auto-updates blacklists in ~1.2 seconds from attack onset.&lt;br&gt;
Code: &lt;a href="https://github.com/Vaibhav805/sentinel-x" rel="noopener noreferrer"&gt;github.com/Vaibhav805/sentinel-x&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The numbers first
&lt;/h2&gt;

&lt;p&gt;All benchmarks on an IdeaPad Slim 3 (AMD Ryzen, 4 cores) using &lt;code&gt;veth&lt;/code&gt; pairs and kernel network namespaces — no physical 10Gbps NIC, just software-emulated interfaces.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total packets processed&lt;/td&gt;
&lt;td&gt;45,327,065&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drop accuracy&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.51%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;False positive rate&lt;/td&gt;
&lt;td&gt;0.31%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XDP verdict latency p50&lt;/td&gt;
&lt;td&gt;0.012 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XDP verdict latency p99&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.066 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iptables under same flood&lt;/td&gt;
&lt;td&gt;2–15 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel memory footprint&lt;/td&gt;
&lt;td&gt;5.2 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ML response time&lt;/td&gt;
&lt;td&gt;~1.2 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency comparison is what matters. iptables at p99 is &lt;strong&gt;30–225× slower&lt;/strong&gt; than Sentinel-X under the same flood. That gap isn't a tuning difference — it's architectural. iptables runs after the kernel has already paid the full cost for every packet. XDP runs before.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why every existing tool breaks under a real flood
&lt;/h2&gt;

&lt;p&gt;When a packet arrives at your NIC under normal Linux networking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NIC receives packet
      │
      ▼
Driver NAPI poll loop
      │
      ▼  ← sk_buff allocated HERE (~256 bytes per packet)
GRO coalescing
      │
      ▼
Netfilter / iptables  ← evaluated unconditionally
      │
      ▼
Routing table lookup
      │
      ▼
Your application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At 10Gbps line-rate, a flood of 64-byte packets hits ~14.88 million packets per second. The kernel allocates a fresh &lt;code&gt;sk_buff&lt;/code&gt; for &lt;strong&gt;every single one&lt;/strong&gt; — including the 13 million that are malicious. Netfilter evaluates its ruleset on every single one. The routing table is consulted on every single one.&lt;/p&gt;

&lt;p&gt;This isn't a bug in iptables. It's a consequence of where in the stack it sits. By the time iptables sees a packet, the kernel has already paid the full cost. You're not filtering the flood — you're processing it and discarding the result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The XDP hook changes where the decision happens:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NIC receives packet
      │
      ▼
Driver NAPI poll loop
      │
      ▼  ← XDP hook fires HERE — Sentinel-X runs here
      │       XDP_DROP? → buffer recycled. kernel allocates nothing.
      │       XDP_PASS? → continue below
      ▼
sk_buff allocated  ← only happens for legitimate traffic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The XDP hook runs inside the driver's receive loop, operating directly on the DMA buffer the NIC wrote into. No copy. No allocation. If the verdict is &lt;code&gt;XDP_DROP&lt;/code&gt;, the buffer is recycled and the packet disappears. The Linux networking stack never knew it arrived.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A packet dropped before &lt;code&gt;sk_buff&lt;/code&gt; allocation costs the kernel nothing.&lt;/strong&gt; This is the axiom the entire project is built around.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;Sentinel-X has two completely separate layers that never block each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1 — the data plane (&lt;code&gt;sentinel_x.c&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;A C program that runs in kernel context, JIT-compiled by the kernel's BPF infrastructure. No userspace memory access. No system calls. No dynamic allocation.&lt;/p&gt;

&lt;p&gt;Every packet runs this pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Packet arrives
      │
      ▼
STEP 1: Header parse            (~3–5 ns)
        Malformed packet → XDP_PASS
      │
      ▼
STEP 2: LPM trie blacklist      (~15–40 ns)
        Match → XDP_DROP
      │
      ▼
STEP 3: Per-IP rate limit       (~10–20 ns)
        Exceeds threshold → XDP_DROP
      │
      ▼
STEP 4: Per-CPU stats update    (~2–5 ns)
      │
      ▼
STEP 5: Ring buffer push        (~5–10 ns)
        Non-blocking async
      │
      ▼
      XDP_PASS → Linux network stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total cost for a dropped packet: &lt;strong&gt;30–70 nanoseconds&lt;/strong&gt;. The kernel never allocates memory. Netfilter never fires.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 — the control plane (Python)
&lt;/h3&gt;

&lt;p&gt;Two processes run in userspace, communicating with the kernel only through BPF maps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;flux.py&lt;/code&gt;&lt;/strong&gt; — compiles and attaches &lt;code&gt;sentinel_x.c&lt;/code&gt; to the interface, initializes all maps, runs the live stats dashboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;bridge.py&lt;/code&gt;&lt;/strong&gt; — polls the BPF ring buffer for packet events, extracts features, runs ML inference, writes new CIDR blacklist entries into the kernel map when an attack is detected.&lt;/p&gt;

&lt;p&gt;The kernel program never waits for the ML engine. The ML engine never slows the fast path. This decoupling is the entire reason p99 stays at 0.066ms under full flood load.&lt;/p&gt;




&lt;h2&gt;
  
  
  The BPF maps
&lt;/h2&gt;

&lt;p&gt;BPF maps are the nervous system of the system — kernel memory accessible from both the eBPF program and userspace.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Map&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;blacklist_map&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;LPM_TRIE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;CIDR-aware IP blacklist&lt;/td&gt;
&lt;td&gt;~4.0 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ip_counts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;HASH&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-IP rate limiting&lt;/td&gt;
&lt;td&gt;~0.8 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;global_stats&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PERCPU_ARRAY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aggregate counters&lt;/td&gt;
&lt;td&gt;~0.2 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;drop_stats&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;PERCPU_ARRAY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-reason drop counters&lt;/td&gt;
&lt;td&gt;~0.2 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ring_buf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;RINGBUF&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Async event stream&lt;/td&gt;
&lt;td&gt;variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~5.2 MB&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;5.2 MB total. A single Nginx worker uses 8–20 MB at idle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;PERCPU_ARRAY&lt;/code&gt; instead of shared atomics?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Under flood conditions, 4 cores each receiving 3.5M packets per second all want to increment the same counter. With &lt;code&gt;lock xadd&lt;/code&gt;, every increment bounces the cache line between cores. At 14M PPS this destroys performance.&lt;/p&gt;

&lt;p&gt;Per-CPU arrays give each core its own slot. Zero coordination. Zero cache bouncing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CPU 0: global_stats[0].packets = 11,331,766
CPU 1: global_stats[1].packets = 11,331,766
CPU 2: global_stats[2].packets = 11,331,767
CPU 3: global_stats[3].packets = 11,331,766
                                 ──────────
                     Total:      45,327,065
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;LPM_TRIE&lt;/code&gt; instead of a hash map?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Botnet IPs often share a subnet — &lt;code&gt;192.168.100.0/24&lt;/code&gt;. An LPM trie handles CIDR ranges natively. One insertion blocks 256 IPs. A flat hash map requires 256 separate insertions and can't express subnets at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  The ML feedback loop
&lt;/h2&gt;

&lt;p&gt;The kernel data plane is fast but static — it only applies rules that already exist. The ML loop makes the system adaptive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timing hierarchy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;XDP verdict:                   ~50–100 ns
Ring buffer production:        ~10 ns
Ring buffer consumption:       ~1–10 ms
ML inference window:           ~100–500 ms
Blacklist update round-trip:   ~1–5 ms
Attack onset → blacklist:      ~1.2 s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fast path is never gated on any of this. XDP decides from maps right now. The ML engine updates them on its own clock. No lock between them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Feature extraction
&lt;/h3&gt;

&lt;p&gt;Every 5 seconds, &lt;code&gt;bridge.py&lt;/code&gt; aggregates ring buffer events into a 6-dimensional feature vector:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;pkt_rate&lt;/code&gt; — packets per second&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;unique_src_ips&lt;/code&gt; — distinct source IPs in the window&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;proto_entropy&lt;/code&gt; — Shannon entropy of protocol distribution &lt;em&gt;(uniform = attack)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;port_entropy&lt;/code&gt; — entropy of destination port distribution&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;byte_rate&lt;/code&gt; — bytes per second&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;syn_ratio&lt;/code&gt; — fraction of TCP packets with SYN set&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why two models, not one
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;XGBoost&lt;/strong&gt; is supervised — trained on labeled traffic windows. Excellent at known attack archetypes: SYN floods, UDP amplification, ICMP floods. Under 1ms inference for a 6-feature vector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Isolation Forest&lt;/strong&gt; is unsupervised — learns what normal traffic looks like and flags anomalies without labeled examples. A safety net for novel attack patterns XGBoost has never seen.&lt;/p&gt;

&lt;p&gt;The conjunction rule: blacklist updates only fire when &lt;strong&gt;both models agree&lt;/strong&gt;. XGBoost alone misses zero-day vectors. Isolation Forest alone generates too many false positives during flash crowds. Together: high precision, high recall.&lt;/p&gt;




&lt;h2&gt;
  
  
  The hardest bugs I hit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bug 1 — the orphaned XDP program that killed my network
&lt;/h3&gt;

&lt;p&gt;Nobody warns you about this upfront.&lt;/p&gt;

&lt;p&gt;If your loader process crashes with &lt;code&gt;kill -9&lt;/code&gt; while an XDP program is attached, &lt;strong&gt;the program stays attached&lt;/strong&gt;. An XDP program returning &lt;code&gt;XDP_DROP&lt;/code&gt; for every packet will black-hole 100% of traffic on that interface until someone manually detaches it.&lt;/p&gt;

&lt;p&gt;This happened to me. My machine lost all network connectivity mid-session. No ping. No SSH. Nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Emergency detach&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;ip &lt;span class="nb"&gt;link set &lt;/span&gt;dev eth0 xdp off

&lt;span class="c"&gt;# Verify&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;bpftool net show dev eth0
&lt;span class="c"&gt;# Should show: xdp: &amp;lt;none&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I now handle &lt;code&gt;SIGINT&lt;/code&gt; and &lt;code&gt;SIGTERM&lt;/code&gt; in &lt;code&gt;flux.py&lt;/code&gt; to always call &lt;code&gt;BPF.remove_xdp(dev)&lt;/code&gt; before exit. Graceful shutdown is a safety feature, not a nicety.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 2 — the BPF verifier rejecting valid-looking code
&lt;/h3&gt;

&lt;p&gt;My first LPM trie lookup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_lpm_trie_key&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;blacklist_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// verifier rejects this&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;XDP_DROP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verifier error: &lt;code&gt;R0 invalid mem access 'map_value_or_null'&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bpf_map_lookup_elem&lt;/code&gt; can return NULL and the verifier tracks this through every code path. You must null-check unconditionally before any dereference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;bpf_lpm_trie_key&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bpf_map_lookup_elem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;blacklist_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;  &lt;span class="c1"&gt;// required — no exceptions&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;XDP_DROP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once I understood the pattern — every map lookup returns a nullable pointer, every dereference must be guarded — the verifier stopped being my enemy and started catching my bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 3 — stack overflow in kernel context
&lt;/h3&gt;

&lt;p&gt;BPF programs have a hard &lt;strong&gt;512-byte stack limit&lt;/strong&gt;. No exceptions. I hit this building a packet struct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;pkt_info&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__u32&lt;/span&gt; &lt;span class="n"&gt;src_ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dst_ip&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u16&lt;/span&gt; &lt;span class="n"&gt;src_port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dst_port&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u8&lt;/span&gt;  &lt;span class="n"&gt;proto&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;__u64&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt;  &lt;span class="n"&gt;src_str&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;    &lt;span class="c1"&gt;// this killed me&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt;  &lt;span class="n"&gt;dst_str&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix: stop storing formatted strings in kernel context. Push raw integers to the ring buffer, format in userspace.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bug 4 — the false positive cascade
&lt;/h3&gt;

&lt;p&gt;Early ML logic had OR instead of AND for the conjunction rule. A legitimate traffic spike triggered XGBoost's volumetric class without triggering Isolation Forest. Result: 47 real IPs auto-blacklisted including actual users.&lt;/p&gt;

&lt;p&gt;The fix was a &lt;code&gt;--dry-run&lt;/code&gt; flag — run inference and log what &lt;em&gt;would&lt;/em&gt; happen without touching any maps. I now dry-run for at least an hour after any model change before enabling enforcement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this is actually safe — the BPF verifier contract
&lt;/h2&gt;

&lt;p&gt;Before any eBPF program runs, the verifier proves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every memory access is within bounds&lt;/li&gt;
&lt;li&gt;Every map lookup is null-checked before use&lt;/li&gt;
&lt;li&gt;No unbounded loops — the program terminates&lt;/li&gt;
&lt;li&gt;Stack usage stays under 512 bytes&lt;/li&gt;
&lt;li&gt;No calls to arbitrary kernel functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sentinel-X's kernel component &lt;strong&gt;provably cannot crash the kernel&lt;/strong&gt;. Not "unlikely to crash" — provably cannot. If the program loads, it is safe to run.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;p99 latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sentinel-X&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;XDP pre-stack&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.066 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tc eBPF&lt;/td&gt;
&lt;td&gt;TC ingress hook&lt;/td&gt;
&lt;td&gt;~0.5–2 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iptables / nftables&lt;/td&gt;
&lt;td&gt;Netfilter hook&lt;/td&gt;
&lt;td&gt;~2–15 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snort / Suricata&lt;/td&gt;
&lt;td&gt;Userspace queue&lt;/td&gt;
&lt;td&gt;~5–50 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even eBPF at the tc layer is slower than XDP because it runs after &lt;code&gt;sk_buff&lt;/code&gt; allocation. Stack placement is everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I actually learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The BPF verifier is not your enemy.&lt;/strong&gt; Every rejection pointed at a real bug. Read the error, fix the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decouple fast path from slow path.&lt;/strong&gt; ML inference takes hundreds of milliseconds. XDP decides in nanoseconds. They coexist only because they never share a critical section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graceful shutdown is a safety feature.&lt;/strong&gt; An orphaned XDP program drops all traffic. Handle your signals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-CPU structures are the correct default at high PPS.&lt;/strong&gt; Shared atomics cause cache contention that destroys performance. Per-CPU arrays eliminate the problem entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dry-run before live enforcement, always.&lt;/strong&gt; The blast radius of a misconfigured ML model is very real.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus + Grafana&lt;/strong&gt; — real-time attack dashboards from existing per-CPU stats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Online learning&lt;/strong&gt; — replace static XGBoost with River ML for incremental model updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BGP Blackhole integration&lt;/strong&gt; — announce &lt;code&gt;/32&lt;/code&gt; blackhole routes upstream via GoBGP when flood volume crosses a threshold&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;eBPF CO-RE&lt;/strong&gt; — migrate from BCC to libbpf + BTF for portable pre-compiled binaries on any kernel 5.8+&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;If you've done production eBPF work — especially around XDP attachment modes, CO-RE portability, or AF_XDP zero-copy — I'm genuinely curious what failure modes look like at scale beyond a veth testbed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub: &lt;a href="https://github.com/Vaibhav805/sentinel-x" rel="noopener noreferrer"&gt;github.com/Vaibhav805/sentinel-x&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The README has complete architecture diagrams, full CLI reference, and the operational runbook including the emergency detach procedure.&lt;/p&gt;

</description>
      <category>ebpf</category>
      <category>linux</category>
      <category>security</category>
      <category>cloudflare</category>
    </item>
  </channel>
</rss>
