My Logs Lied: How I Used eBPF to Find the Truth

#systems #kernel #programming #go

A while back, I wrote about the time I accidentally DDOSed my own laptop while load testing a Go auction server. 1000 concurrent clients generated connections so fast that the kernel's TCP listen queue overflowed, but silently, dropping packets before they ever reached my application.

The bug itself had a simple fix. But the experience left me with a harder problem: I had no way to see it happening in real time. Application logs showed nothing. netstat -s gave me a system-wide counter. "11,053 listen queue overflows". But not which process, not when, not why.

So I built a tool to see inside the kernel. This is how it works, what I learned, and one genuinely weird thing I found while benchmarking it.

The Problem with Debugging Network Issues

When a distributed system behaves badly, a database replica lags, a microservice times out, a message queue backs up - the first instinct is to look at application logs. But a whole category of problems lives below the application, in the kernel's networking stack, invisible to anything your code can observe directly.

TCP is designed to be resilient. It retransmits. It backs off. It recovers. But when it can't recover, when the listen queue overflows, when a firewall drops a packet, when a checksum fails, it just... drops the packet. No log entry. No error propagated upward. The application sees a timeout, eventually, but the actual cause happened microseconds earlier, deep in kernel code.

The tools most developers reach for don't help here.

tcpdump captures packets on the wire, but it's too heavyweight to run continuously in production. It also shows you what arrived, not what got dropped. netstat -s gives you aggregate counters - "11,053 listen queue overflows" - but nothing about which process, when, or why. You're left guessing.

What I needed was something that could sit right at the point where the kernel drops a packet and report back: who was affected, why did it happen, and exactly where in kernel code did the decision get made.

Enter eBPF

eBPF (extended Berkeley Packet Filter) is a technology that lets you run small, sandboxed programs inside the Linux kernel without modifying kernel source code or loading kernel modules (which is a lot more painful). It's been around for years, used by tools like Cilium and Datadog, but it's surprisingly accessible for individual developers once you understand the basics.

The key insight is that the Linux kernel has built-in attachment points or hooks called tracepoints - stable, documented hooks left by kernel developers for exactly this kind of observability. For packet drops, the relevant tracepoint is skb/kfree_skb. Every time the kernel frees a socket buffer (which is what happens when a packet is dropped), this tracepoint fires.

So the plan was straightforward: hook kfree_skb, capture the information I needed, and get it out to my Go application in real time.

Building the Monitor

The Kernel Side

The eBPF program itself is surprisingly small — about 30 lines of C:

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

struct event {
    u32 pid;        // Process context when drop occurred
    u32 reason;     // Why the kernel dropped it
    u64 location;   // Instruction pointer — where in kernel code
};

struct {
    __uint(type, BPF_MAP_TYPE_RINGBUF);
    __uint(max_entries, 1 << 16);  // 64KB ring buffer
    __type(value, struct event);
} events SEC(".maps");

SEC("tracepoint/skb/kfree_skb")
int trace_tcp_drop(struct trace_event_raw_kfree_skb *ctx) {
    if (ctx->reason <= 1) return 0;  // Not a real drop, bail immediately

    struct event *e = bpf_ringbuf_reserve(&events, sizeof(*e), 0);
    if (!e) return 0;  // Ring buffer full — skip silently

    e->pid      = bpf_get_current_pid_tgid() >> 32;
    e->reason   = ctx->reason;
    e->location = (u64)ctx->location;
    bpf_ringbuf_submit(e, 0);
    return 0;
}

A few things worth noting here. The filter on line one (reason <= 1) is critical - kfree_skb fires for every packet that gets freed, including ones that completed successfully. Reason 0 (SKB_DROP_REASON_NOT_SPECIFIED) and reason 1 (SKB_DROP_REASON_NO_REASON) are normal lifecycle events. The kernel freed the buffer after it was done with the packet, not because something went wrong. We only care about actual drops, so we bail out immediately for everything else. This keeps the overhead minimal even though we're hooking a very hot kernel path.

The ring buffer is a 64KB circular queue shared between kernel and userspace. When we call bpf_ringbuf_reserve, we claim space in it. When we call bpf_ringbuf_submit, the data becomes visible to userspace. If the buffer is full because userspace isn't reading fast enough, reserve returns NULL and we silently skip that event — no blocking, no spinning. The eBPF verifier enforces this: our program must terminate quickly, no exceptions.

Before any of this runs, the kernel's eBPF verifier statically analyzes the program. It proves there are no infinite loops, no unsafe memory accesses, no calls to unapproved functions. If verification fails, the program doesn't load. This is why eBPF is safe to run in production — you literally cannot get dangerous code past the verifier.

The Userspace Side

The Go side reads from the ring buffer and turns raw kernel events into something useful.

rd, _ := ringbuf.NewReader(objs.Events)

for {
    record, _ := rd.Read()  // Blocks until an event is available
    event := *(*monitorEvent)(unsafe.Pointer(&record.RawSample[0]))

    symbolName := findNearestSymbol(event.Location)
    reasonStr  := dropReasons[event.Reason]

    fmt.Fprintf(bufferedWriter, "[%s] Drop | PID: %-6d | Reason: %-18s | Function: %s\n",
        time.Now().Format("15:04:05"), event.Pid, reasonStr, symbolName)
}

The unsafe.Pointer cast deserves explanation. The ring buffer gives us raw bytes. We know the layout matches our C struct exactly - same fields, same order, same sizes. Rather than parsing the bytes manually (slow, error-prone), we reinterpret them directly as a Go struct. Zero allocation, zero copying. It's the only unsafe usage in the codebase, and it's justified.

Symbol resolution is where the interesting work happens. The kernel gave us a raw instruction pointer, something like 0xffffffff81a2b574. Meaningless to a human. To translate it, we load /proc/kallsyms at startup, around 200,000 kernel symbols, sorted by address. Then for each event, we do a binary search to find the function that contains our address, calculate the offset, and produce output like tcp_v4_syn_recv_sock+0x234. Now you know exactly which kernel function dropped the packet.

The output is written through a 256KB bufio.Writer. This matters more than it might seem, and it connects to something I discovered later.

What It Looks Like in Practice

[15:04:23] Drop | PID: 1234 | Reason: TCP_LISTEN_OVERFLOW | Function: tcp_v4_syn_recv_sock+0x234
[15:04:23] Drop | PID: 1234 | Reason: TCP_LISTEN_OVERFLOW | Function: tcp_v4_syn_recv_sock+0x234
[15:04:23] Drop | PID: 5678 | Reason: NETFILTER_DROP      | Function: nf_hook_slow+0x12a

Each line tells you: when it happened, which process was in context, why the kernel dropped it, and exactly where in kernel code the decision was made.

A Note on PID Accuracy

bpf_get_current_pid_tgid() returns the PID of whichever process the kernel is running when the drop occurs. For TCP_LISTEN_OVERFLOW, this is typically the listening process, the drop happens in its context. But for other drop types, particularly ones that occur during interrupt handling or in kernel threads, the PID might not correspond to the actual owner of the dropped packet.

This is a fundamental limitation of the approach. The kernel doesn't always know which userspace process "owns" a packet at the point it gets dropped. For debugging specific issues like my listen queue overflow, the PID is accurate and useful. For a general-purpose production monitoring tool, you'd want to validate accuracy per drop type before relying on it.

Beyond the Code

This project started as a way to fix a single bug, but it ended up being a masterclass in how much complexity lives just beneath our main() functions.

While you might reach for a platform like Cilium or Datadog for 24/7 production observability, there is something incredibly powerful about writing 30 lines of C that can peer into the heart of the kernel. It turns the "black box" of networking into a transparent stream of events.

The source for this project is open on GitHub.

Presented at Bengaluru Systems Meetup, January 2026. Thanks to the organizers for the welcoming "just show up and talk about what you built" energy, it made all the difference.