DEV Community

Mohamed Zrouga
Mohamed Zrouga

Posted on

Building a Real-Time Network Monitor with eBPF: Lessons from Cerberus

As a platform engineer working with critical infrastructure, I needed deep visibility into network behavior without the overhead of traditional packet capture tools. The result? Cerberus - an eBPF-based network monitor that processes packets at the kernel level with near-zero overhead.

This post walks through the architecture, challenges, and lessons learned from building a production-grade network monitoring tool using eBPF and Go.

Cerberus in Action

Why eBPF for Network Monitoring?

Traditional network monitoring tools like tcpdump and Wireshark are powerful but come with drawbacks:

  • Context switches: Every packet copies data from kernel to user space
  • Performance overhead: Processing in user space is slower
  • Security: Full packet capture requires elevated privileges everywhere
  • Scalability: High-traffic networks can overwhelm traditional tools

eBPF (Extended Berkeley Packet Filter) solves these by running sandboxed programs directly in the Linux kernel, allowing you to:

  • Filter and process packets at line rate
  • Extract only the data you need (no full packet copies)
  • Aggregate statistics in kernel space
  • Minimize context switches with ring buffers

Architecture Overview

Cerberus uses a two-layer architecture:

Cerberus user space and Kernel space

The eBPF Program: Kernel-Space Magic

The heart of Cerberus is a TC (Traffic Control) classifier written in C that attaches to network interfaces. Here's what it does:

1. Packet Parsing at Line Rate

struct network_event {
    __u8 event_type;       // Event classification
    __u8 src_mac[6];       // Source MAC
    __u8 dst_mac[6];       // Destination MAC
    __u32 src_ip;          // Source IP
    __u32 dst_ip;          // Destination IP
    __u16 src_port;        // Source port
    __u16 dst_port;        // Destination port
    __u8 protocol;         // IP protocol
    __u8 tcp_flags;        // TCP flags
    __u16 arp_op;          // ARP operation
    __u8 icmp_type;        // ICMP type
    __u8 l7_payload[32];   // Layer 7 inspection data
} __attribute__((packed));
Enter fullscreen mode Exit fullscreen mode

Key Design Decision: We only capture 75 bytes per event. This is enough for classification and L7 inspection without the overhead of full packet capture.

2. Multi-Protocol Detection

The eBPF program identifies 7 different protocol types:

// Simplified version of the protocol detection logic
SEC("tc")
int monitor_ingress(struct __sk_buff *skb) {
    void *data = (void *)(long)skb->data;
    void *data_end = (void *)(long)skb->data_end;

    struct ethhdr *eth = data;
    if ((void *)(eth + 1) > data_end)
        return TC_ACT_OK;

    // ARP detection
    if (eth->h_proto == bpf_htons(ETH_P_ARP)) {
        return handle_arp(skb, eth, data_end);
    }

    // IP packet processing
    if (eth->h_proto == bpf_htons(ETH_P_IP)) {
        struct iphdr *ip = (void *)(eth + 1);
        if ((void *)(ip + 1) > data_end)
            return TC_ACT_OK;

        switch (ip->protocol) {
            case IPPROTO_TCP:
                return handle_tcp(skb, eth, ip, data_end);
            case IPPROTO_UDP:
                return handle_udp(skb, eth, ip, data_end);
            case IPPROTO_ICMP:
                return handle_icmp(skb, eth, ip, data_end);
        }
    }

    return TC_ACT_OK;
}
Enter fullscreen mode Exit fullscreen mode

3. Layer 7 Inspection

Here's where it gets interesting. We extract application-layer data for DNS, HTTP, and TLS:

// DNS query extraction
static __always_inline int handle_dns(struct __sk_buff *skb, 
                                       struct udphdr *udp,
                                       void *data_end) {
    void *dns_data = (void *)(udp + 1);

    // Copy up to 32 bytes of DNS payload
    if (dns_data + 32 <= data_end) {
        bpf_probe_read(&event.l7_payload, 32, dns_data);
        event.event_type = EVENT_DNS;
    }

    bpf_ringbuf_submit(&event, 0);
    return TC_ACT_OK;
}

// HTTP detection (simplified)
static __always_inline int handle_http(struct __sk_buff *skb,
                                        struct tcphdr *tcp,
                                        void *data_end) {
    void *http_data = (void *)(tcp + 1);

    // Check for HTTP methods
    if (http_data + 4 <= data_end) {
        char method[4];
        bpf_probe_read(&method, 4, http_data);

        if (method[0] == 'G' && method[1] == 'E' && 
            method[2] == 'T' && method[3] == ' ') {
            event.event_type = EVENT_HTTP;
            bpf_probe_read(&event.l7_payload, 32, http_data);
        }
    }

    return TC_ACT_OK;
}

// TLS handshake detection
static __always_inline int handle_tls(struct __sk_buff *skb,
                                       struct tcphdr *tcp,
                                       void *data_end) {
    void *tls_data = (void *)(tcp + 1);

    if (tls_data + 5 <= data_end) {
        __u8 content_type;
        bpf_probe_read(&content_type, 1, tls_data);

        // 0x16 = Handshake
        if (content_type == 0x16) {
            event.event_type = EVENT_TLS;
            bpf_probe_read(&event.l7_payload, 32, tls_data);
        }
    }

    return TC_ACT_OK;
}
Enter fullscreen mode Exit fullscreen mode

Challenge #1: The Verifier

The eBPF verifier is strict. Every memory access must be bounds-checked:

// This will fail verification:
char *data = packet + offset;
*data = value;  // ❌ No bounds check

// This passes:
if ((void *)(packet + offset + 1) <= data_end) {
    char *data = packet + offset;
    *data = value;  // ✅ Verified safe
}
Enter fullscreen mode Exit fullscreen mode

User-Space Processing: Go Application

The Go application reads events from the ring buffer and performs higher-level analysis:

1. Ring Buffer Polling

func (m *NetworkMonitor) pollEvents(ctx context.Context) {
    rb, err := m.module.InitRingBuf("events", m.eventsChannel)
    if err != nil {
        log.Fatal(err)
    }
    defer rb.Close()

    rb.Poll(300) // 300ms timeout

    for {
        select {
        case <-ctx.Done():
            return
        case record := <-m.eventsChannel:
            m.processEvent(record)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

2. Smart Deduplication with LRU Cache

To avoid alert fatigue, we only show new traffic patterns:

type NetworkMonitor struct {
    deviceCache  *lru.Cache        // Device tracking
    patternCache *lru.Cache        // Unique communication patterns
    db           *buntdb.DB        // Persistent storage
}

func (m *NetworkMonitor) processEvent(data []byte) {
    event := parseNetworkEvent(data)

    // Track device
    deviceKey := event.SrcMAC.String()
    if !m.deviceCache.Contains(deviceKey) {
        m.announceNewDevice(event)
        m.deviceCache.Add(deviceKey, true)
    }

    // Track pattern (first occurrence only)
    patternKey := fmt.Sprintf("%s:%s:%d:%s",
        event.SrcMAC, event.DstIP, event.DstPort, event.EventType)

    if !m.patternCache.Contains(patternKey) {
        m.printTrafficPattern(event)
        m.patternCache.Add(patternKey, true)
    }

    // Update statistics
    m.updateStats(event)
}
Enter fullscreen mode Exit fullscreen mode

3. Layer 7 Parsing in User Space

func parseL7Payload(event *NetworkEvent) string {
    switch event.EventType {
    case EVENT_DNS:
        return parseDNSQuery(event.L7Payload)
    case EVENT_HTTP:
        return parseHTTPRequest(event.L7Payload)
    case EVENT_TLS:
        return "TLS"
    default:
        return ""
    }
}

func parseDNSQuery(payload []byte) string {
    // Skip DNS header (12 bytes)
    offset := 12

    var domain []byte
    for offset < len(payload) {
        length := int(payload[offset])
        if length == 0 {
            break
        }

        if len(domain) > 0 {
            domain = append(domain, '.')
        }

        offset++
        if offset+length > len(payload) {
            break
        }

        domain = append(domain, payload[offset:offset+length]...)
        offset += length
    }

    return string(domain)
}
Enter fullscreen mode Exit fullscreen mode

Performance Optimizations

1. Zero-Copy with Ring Buffers

Ring buffers allow the kernel to write directly to memory shared with user space:

// No data copying - just reading shared memory
rb.Poll(300)
Enter fullscreen mode Exit fullscreen mode

2. Batch Database Writes

Instead of writing every event to disk, we batch them:

func (m *NetworkMonitor) persistStats() {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()

    for range ticker.C {
        m.db.Update(func(tx *buntdb.Tx) error {
            for mac, stats := range m.deviceStats {
                data, _ := json.Marshal(stats)
                tx.Set(mac, string(data), nil)
            }
            return nil
        })
    }
}
Enter fullscreen mode Exit fullscreen mode

3. LRU Cache Tuning

// Balance memory vs. accuracy
deviceCache, _ := lru.New(1000)   // Track 1000 devices
patternCache, _ := lru.New(10000) // Track 10k patterns
Enter fullscreen mode Exit fullscreen mode

Real-World Output

Here's what you see when running Cerberus:

NEW DEVICE DETECTED!
   MAC:     dc:62:79:2f:39:28
   IP:      192.168.0.108
   Vendor:  Apple
   First Seen: 2024-12-06 16:51:12

[DNS] 192.168.0.100 (aa:bb:cc:dd:ee:ff) [Apple] → 8.8.8.8:53 [google.com]
[HTTP] 192.168.0.100 (aa:bb:cc:dd:ee:ff) [Apple] → 93.184.216.34:80 [GET /api/v1/users]
[TLS] 192.168.0.100 (aa:bb:cc:dd:ee:ff) [Apple] → 142.250.185.46:443 [TLS]
[TCP] 192.168.0.50 (11:22:33:44:55:66) [Raspberry Pi] → 192.168.0.200:22 (SSH)

╔════════════════════════════════════════════════════╗
║         NETWORK STATISTICS SUMMARY                 ║
╠════════════════════════════════════════════════════╣
║ Total Devices: 15                                  ║
║ Total Packets: 45821                               ║
║   - TCP:  38456                                    ║
║   - UDP:  6120                                     ║
║   - DNS:  892                                      ║
║   - HTTP: 156                                      ║
║   - TLS:  1834                                     ║
╚════════════════════════════════════════════════════╝
Enter fullscreen mode Exit fullscreen mode

Challenges and Solutions

Challenge #1: eBPF Complexity Limit

eBPF programs are limited to ~1 million instructions. Solution: Keep parsing logic simple and move complex analysis to user space.

Challenge #2: Payload Size Tradeoff

More payload = better L7 inspection, but higher overhead. Solution: 32 bytes is enough for DNS queries, HTTP methods, and TLS detection.

Challenge #3: TC vs XDP

Initially considered XDP (eXpress Data Path) but TC offered better compatibility and easier development. XDP is faster but more restrictive.

Challenge #4: Cross-Kernel Compatibility

Different kernel versions support different eBPF features. Solution: Use CO-RE (Compile Once, Run Everywhere) via libbpf.

Lessons Learned

  1. Start simple: Basic packet capture first, then add L7 inspection
  2. Trust the verifier: If it rejects your code, there's probably a real bug
  3. Test incrementally: Use bpftool to inspect maps and programs
  4. Memory matters: Every byte in your event structure adds overhead
  5. User space is your friend: Don't try to do everything in eBPF

Production Considerations

For critical infrastructure monitoring (my day job), I've learned:

  • Alert fatigue is real: Smart deduplication is essential
  • Performance monitoring: Track your own overhead
  • Graceful degradation: Handle packet loss at high traffic volumes
  • Security: Limit L7 payload capture to metadata only

What's Next?

The roadmap for Cerberus includes:

  • Redis backend for distributed deployments
  • Prometheus metrics export
  • Anomaly detection using pattern baselines
  • IPv6 support
  • Extended L7 inspection (128-256 byte payloads)
  • Web dashboard for visualization

Try It Yourself

git clone https://github.com/zrougamed/cerberus.git
cd cerberus
make
sudo ./build/cerberus
Enter fullscreen mode Exit fullscreen mode

Requirements:

  • Linux kernel 4.18+
  • Go 1.24+
  • Root privileges

Contributing

Cerberus is open source and looking for contributors! Whether you're interested in:

  • eBPF kernel programming
  • Go application development
  • Network protocol analysis
  • Performance optimization
  • Documentation

Check out the GitHub repository and open an issue or PR.

Resources

Conclusion

Building Cerberus taught me that eBPF isn't just about performance - it's about rethinking how we approach observability. By moving intelligence into the kernel, we can monitor networks at scale without sacrificing visibility.

The combination of eBPF's efficiency and Go's expressiveness creates a powerful platform for building modern monitoring tools. Whether you're securing critical infrastructure or just curious about network traffic, eBPF opens up possibilities that weren't feasible before.

What would you build with eBPF? Let me know in the comments!


Mohamed Zrouga is a Senior Platform Engineer at Deltaflare, specializing in critical infrastructure protection and DevSecOps. Connect on GitHub or visit zrouga.email

Top comments (0)