Samson Tanimawo

Posted on Apr 27

eBPF for SREs: Observability Without Agents

#ebpf #observability #linux #kernel

The Agent Problem

Traditional monitoring means shipping an agent with every service. That agent:

Adds memory overhead
Needs to be updated
Gets out of date
Breaks with kernel upgrades
Needs instrumentation code

eBPF says: what if the kernel itself could emit observability data?

What eBPF Actually Is

eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs inside the Linux kernel without recompiling or loading modules. It was originally for packet filtering. Now it powers Cilium, Pixie, Falco, and dozens of other tools.

From an SRE perspective: you get deep visibility into syscalls, network traffic, process behavior, and filesystem operations with zero code changes to your applications.

What You Can Observe

network:
- every TCP connection (src, dst, bytes, duration)
- DNS queries and response times
- TLS handshake failures
- HTTP request/response cycles

application:
- function call latencies (uprobes)
- memory allocations
- lock contention
- GC pauses

security:
- syscall audit trails
- privilege escalations
- suspicious file access
- container escape attempts

performance:
- CPU scheduling delays
- I/O wait time per process
- disk latency histograms
- page fault patterns

All of this without modifying your application code.

A Practical Example: Detecting Slow HTTP Requests

Traditional approach: instrument your HTTP framework with OpenTelemetry, deploy a collector, ship traces.

eBPF approach:

# Install bpftrace
sudo apt install bpftrace

# Trace every HTTP response larger than 1MB
sudo bpftrace -e '
uprobe:/usr/lib/libssl.so:SSL_write {
@http_writes[pid] = count();
@http_bytes[comm] = sum(arg2);
}
'

No code changes. No restarts. Real-time visibility.

Tools Worth Knowing

1. Pixie (now part of New Relic)

Auto-instruments every service in your K8s cluster
No code changes, no sidecars
Full HTTP, MySQL, Postgres, DNS tracing
Open source

2. Cilium

Network observability + security policy enforcement
Replaces kube-proxy
Hubble UI for service-to-service traffic visualization

3. Falco

Runtime security detection
"Alert if a process inside a container spawns a shell"
Writes rules in YAML

4. Parca

Continuous profiling via eBPF
See CPU flame graphs across your entire fleet
Identify the most expensive code paths

5. Tracee

Security-focused eBPF tracing
Detects privilege escalations, cryptojacking, suspicious syscalls

The Tradeoffs

Pros:

Zero app code changes
Near-zero overhead (kernel-level efficiency)
Unified view across languages (Go, Python, Java, Rust, all seen the same way)
No agent lifecycle to manage

Cons:

Requires Linux 4.14+ (5.0+ preferred)
Steep learning curve for custom probes
Limited visibility into in-process logic (you see syscalls, not business logic)
eBPF verifier rejects programs for subtle reasons

When eBPF Shines

Network debugging: "Why is service A slow to reach service B?"
Security auditing: "What containers are making unexpected syscalls?"
Performance profiling: "Where is the cluster CPU time actually going?"
Incident forensics: "Reconstruct the syscall timeline during the outage"

When eBPF Is Wrong

Business logic observability you still need OpenTelemetry for spans
Application errors your logs and exception tracking still matter
Multi-region correlation eBPF is node-local

Use eBPF for infrastructure and network. Use OpenTelemetry for application logic. They complement each other.

Getting Started

Deploy Pixie in a dev cluster (1-line install)
Open the UI, watch real-time HTTP traffic
Try a bpftrace one-liner to trace a specific syscall
Read the Cilium + Hubble docs
Replace one agent-based tool with its eBPF equivalent

The future of observability is kernel-native. Agent-based tools will still exist, but the gap will keep shrinking.

Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

DEV Community