As a platform engineer working with critical infrastructure, I needed deep visibility into network behavior without the overhead of traditional packet capture tools. The result? Cerberus - an eBPF-based network monitor that processes packets at the kernel level with near-zero overhead.
This post walks through the architecture, challenges, and lessons learned from building a production-grade network monitoring tool using eBPF and Go.
Why eBPF for Network Monitoring?
Traditional network monitoring tools like tcpdump and Wireshark are powerful but come with drawbacks:
- Context switches: Every packet copies data from kernel to user space
- Performance overhead: Processing in user space is slower
- Security: Full packet capture requires elevated privileges everywhere
- Scalability: High-traffic networks can overwhelm traditional tools
eBPF (Extended Berkeley Packet Filter) solves these by running sandboxed programs directly in the Linux kernel, allowing you to:
- Filter and process packets at line rate
- Extract only the data you need (no full packet copies)
- Aggregate statistics in kernel space
- Minimize context switches with ring buffers
Architecture Overview
Cerberus uses a two-layer architecture:
The eBPF Program: Kernel-Space Magic
The heart of Cerberus is a TC (Traffic Control) classifier written in C that attaches to network interfaces. Here's what it does:
1. Packet Parsing at Line Rate
struct network_event {
__u8 event_type; // Event classification
__u8 src_mac[6]; // Source MAC
__u8 dst_mac[6]; // Destination MAC
__u32 src_ip; // Source IP
__u32 dst_ip; // Destination IP
__u16 src_port; // Source port
__u16 dst_port; // Destination port
__u8 protocol; // IP protocol
__u8 tcp_flags; // TCP flags
__u16 arp_op; // ARP operation
__u8 icmp_type; // ICMP type
__u8 l7_payload[32]; // Layer 7 inspection data
} __attribute__((packed));
Key Design Decision: We only capture 75 bytes per event. This is enough for classification and L7 inspection without the overhead of full packet capture.
2. Multi-Protocol Detection
The eBPF program identifies 7 different protocol types:
// Simplified version of the protocol detection logic
SEC("tc")
int monitor_ingress(struct __sk_buff *skb) {
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end)
return TC_ACT_OK;
// ARP detection
if (eth->h_proto == bpf_htons(ETH_P_ARP)) {
return handle_arp(skb, eth, data_end);
}
// IP packet processing
if (eth->h_proto == bpf_htons(ETH_P_IP)) {
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end)
return TC_ACT_OK;
switch (ip->protocol) {
case IPPROTO_TCP:
return handle_tcp(skb, eth, ip, data_end);
case IPPROTO_UDP:
return handle_udp(skb, eth, ip, data_end);
case IPPROTO_ICMP:
return handle_icmp(skb, eth, ip, data_end);
}
}
return TC_ACT_OK;
}
3. Layer 7 Inspection
Here's where it gets interesting. We extract application-layer data for DNS, HTTP, and TLS:
// DNS query extraction
static __always_inline int handle_dns(struct __sk_buff *skb,
struct udphdr *udp,
void *data_end) {
void *dns_data = (void *)(udp + 1);
// Copy up to 32 bytes of DNS payload
if (dns_data + 32 <= data_end) {
bpf_probe_read(&event.l7_payload, 32, dns_data);
event.event_type = EVENT_DNS;
}
bpf_ringbuf_submit(&event, 0);
return TC_ACT_OK;
}
// HTTP detection (simplified)
static __always_inline int handle_http(struct __sk_buff *skb,
struct tcphdr *tcp,
void *data_end) {
void *http_data = (void *)(tcp + 1);
// Check for HTTP methods
if (http_data + 4 <= data_end) {
char method[4];
bpf_probe_read(&method, 4, http_data);
if (method[0] == 'G' && method[1] == 'E' &&
method[2] == 'T' && method[3] == ' ') {
event.event_type = EVENT_HTTP;
bpf_probe_read(&event.l7_payload, 32, http_data);
}
}
return TC_ACT_OK;
}
// TLS handshake detection
static __always_inline int handle_tls(struct __sk_buff *skb,
struct tcphdr *tcp,
void *data_end) {
void *tls_data = (void *)(tcp + 1);
if (tls_data + 5 <= data_end) {
__u8 content_type;
bpf_probe_read(&content_type, 1, tls_data);
// 0x16 = Handshake
if (content_type == 0x16) {
event.event_type = EVENT_TLS;
bpf_probe_read(&event.l7_payload, 32, tls_data);
}
}
return TC_ACT_OK;
}
Challenge #1: The Verifier
The eBPF verifier is strict. Every memory access must be bounds-checked:
// This will fail verification:
char *data = packet + offset;
*data = value; // ❌ No bounds check
// This passes:
if ((void *)(packet + offset + 1) <= data_end) {
char *data = packet + offset;
*data = value; // ✅ Verified safe
}
User-Space Processing: Go Application
The Go application reads events from the ring buffer and performs higher-level analysis:
1. Ring Buffer Polling
func (m *NetworkMonitor) pollEvents(ctx context.Context) {
rb, err := m.module.InitRingBuf("events", m.eventsChannel)
if err != nil {
log.Fatal(err)
}
defer rb.Close()
rb.Poll(300) // 300ms timeout
for {
select {
case <-ctx.Done():
return
case record := <-m.eventsChannel:
m.processEvent(record)
}
}
}
2. Smart Deduplication with LRU Cache
To avoid alert fatigue, we only show new traffic patterns:
type NetworkMonitor struct {
deviceCache *lru.Cache // Device tracking
patternCache *lru.Cache // Unique communication patterns
db *buntdb.DB // Persistent storage
}
func (m *NetworkMonitor) processEvent(data []byte) {
event := parseNetworkEvent(data)
// Track device
deviceKey := event.SrcMAC.String()
if !m.deviceCache.Contains(deviceKey) {
m.announceNewDevice(event)
m.deviceCache.Add(deviceKey, true)
}
// Track pattern (first occurrence only)
patternKey := fmt.Sprintf("%s:%s:%d:%s",
event.SrcMAC, event.DstIP, event.DstPort, event.EventType)
if !m.patternCache.Contains(patternKey) {
m.printTrafficPattern(event)
m.patternCache.Add(patternKey, true)
}
// Update statistics
m.updateStats(event)
}
3. Layer 7 Parsing in User Space
func parseL7Payload(event *NetworkEvent) string {
switch event.EventType {
case EVENT_DNS:
return parseDNSQuery(event.L7Payload)
case EVENT_HTTP:
return parseHTTPRequest(event.L7Payload)
case EVENT_TLS:
return "TLS"
default:
return ""
}
}
func parseDNSQuery(payload []byte) string {
// Skip DNS header (12 bytes)
offset := 12
var domain []byte
for offset < len(payload) {
length := int(payload[offset])
if length == 0 {
break
}
if len(domain) > 0 {
domain = append(domain, '.')
}
offset++
if offset+length > len(payload) {
break
}
domain = append(domain, payload[offset:offset+length]...)
offset += length
}
return string(domain)
}
Performance Optimizations
1. Zero-Copy with Ring Buffers
Ring buffers allow the kernel to write directly to memory shared with user space:
// No data copying - just reading shared memory
rb.Poll(300)
2. Batch Database Writes
Instead of writing every event to disk, we batch them:
func (m *NetworkMonitor) persistStats() {
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()
for range ticker.C {
m.db.Update(func(tx *buntdb.Tx) error {
for mac, stats := range m.deviceStats {
data, _ := json.Marshal(stats)
tx.Set(mac, string(data), nil)
}
return nil
})
}
}
3. LRU Cache Tuning
// Balance memory vs. accuracy
deviceCache, _ := lru.New(1000) // Track 1000 devices
patternCache, _ := lru.New(10000) // Track 10k patterns
Real-World Output
Here's what you see when running Cerberus:
NEW DEVICE DETECTED!
MAC: dc:62:79:2f:39:28
IP: 192.168.0.108
Vendor: Apple
First Seen: 2024-12-06 16:51:12
[DNS] 192.168.0.100 (aa:bb:cc:dd:ee:ff) [Apple] → 8.8.8.8:53 [google.com]
[HTTP] 192.168.0.100 (aa:bb:cc:dd:ee:ff) [Apple] → 93.184.216.34:80 [GET /api/v1/users]
[TLS] 192.168.0.100 (aa:bb:cc:dd:ee:ff) [Apple] → 142.250.185.46:443 [TLS]
[TCP] 192.168.0.50 (11:22:33:44:55:66) [Raspberry Pi] → 192.168.0.200:22 (SSH)
╔════════════════════════════════════════════════════╗
║ NETWORK STATISTICS SUMMARY ║
╠════════════════════════════════════════════════════╣
║ Total Devices: 15 ║
║ Total Packets: 45821 ║
║ - TCP: 38456 ║
║ - UDP: 6120 ║
║ - DNS: 892 ║
║ - HTTP: 156 ║
║ - TLS: 1834 ║
╚════════════════════════════════════════════════════╝
Challenges and Solutions
Challenge #1: eBPF Complexity Limit
eBPF programs are limited to ~1 million instructions. Solution: Keep parsing logic simple and move complex analysis to user space.
Challenge #2: Payload Size Tradeoff
More payload = better L7 inspection, but higher overhead. Solution: 32 bytes is enough for DNS queries, HTTP methods, and TLS detection.
Challenge #3: TC vs XDP
Initially considered XDP (eXpress Data Path) but TC offered better compatibility and easier development. XDP is faster but more restrictive.
Challenge #4: Cross-Kernel Compatibility
Different kernel versions support different eBPF features. Solution: Use CO-RE (Compile Once, Run Everywhere) via libbpf.
Lessons Learned
- Start simple: Basic packet capture first, then add L7 inspection
- Trust the verifier: If it rejects your code, there's probably a real bug
-
Test incrementally: Use
bpftoolto inspect maps and programs - Memory matters: Every byte in your event structure adds overhead
- User space is your friend: Don't try to do everything in eBPF
Production Considerations
For critical infrastructure monitoring (my day job), I've learned:
- Alert fatigue is real: Smart deduplication is essential
- Performance monitoring: Track your own overhead
- Graceful degradation: Handle packet loss at high traffic volumes
- Security: Limit L7 payload capture to metadata only
What's Next?
The roadmap for Cerberus includes:
- Redis backend for distributed deployments
- Prometheus metrics export
- Anomaly detection using pattern baselines
- IPv6 support
- Extended L7 inspection (128-256 byte payloads)
- Web dashboard for visualization
Try It Yourself
git clone https://github.com/zrougamed/cerberus.git
cd cerberus
make
sudo ./build/cerberus
Requirements:
- Linux kernel 4.18+
- Go 1.24+
- Root privileges
Contributing
Cerberus is open source and looking for contributors! Whether you're interested in:
- eBPF kernel programming
- Go application development
- Network protocol analysis
- Performance optimization
- Documentation
Check out the GitHub repository and open an issue or PR.
Resources
Conclusion
Building Cerberus taught me that eBPF isn't just about performance - it's about rethinking how we approach observability. By moving intelligence into the kernel, we can monitor networks at scale without sacrificing visibility.
The combination of eBPF's efficiency and Go's expressiveness creates a powerful platform for building modern monitoring tools. Whether you're securing critical infrastructure or just curious about network traffic, eBPF opens up possibilities that weren't feasible before.
What would you build with eBPF? Let me know in the comments!
Mohamed Zrouga is a Senior Platform Engineer at Deltaflare, specializing in critical infrastructure protection and DevSecOps. Connect on GitHub or visit zrouga.email


Top comments (0)