The Agent Problem
Traditional monitoring means shipping an agent with every service. That agent:
- Adds memory overhead
- Needs to be updated
- Gets out of date
- Breaks with kernel upgrades
- Needs instrumentation code
eBPF says: what if the kernel itself could emit observability data?
What eBPF Actually Is
eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs inside the Linux kernel without recompiling or loading modules. It was originally for packet filtering. Now it powers Cilium, Pixie, Falco, and dozens of other tools.
From an SRE perspective: you get deep visibility into syscalls, network traffic, process behavior, and filesystem operations with zero code changes to your applications.
What You Can Observe
network:
- every TCP connection (src, dst, bytes, duration)
- DNS queries and response times
- TLS handshake failures
- HTTP request/response cycles
application:
- function call latencies (uprobes)
- memory allocations
- lock contention
- GC pauses
security:
- syscall audit trails
- privilege escalations
- suspicious file access
- container escape attempts
performance:
- CPU scheduling delays
- I/O wait time per process
- disk latency histograms
- page fault patterns
All of this without modifying your application code.
A Practical Example: Detecting Slow HTTP Requests
Traditional approach: instrument your HTTP framework with OpenTelemetry, deploy a collector, ship traces.
eBPF approach:
# Install bpftrace
sudo apt install bpftrace
# Trace every HTTP response larger than 1MB
sudo bpftrace -e '
uprobe:/usr/lib/libssl.so:SSL_write {
@http_writes[pid] = count();
@http_bytes[comm] = sum(arg2);
}
'
No code changes. No restarts. Real-time visibility.
Tools Worth Knowing
1. Pixie (now part of New Relic)
- Auto-instruments every service in your K8s cluster
- No code changes, no sidecars
- Full HTTP, MySQL, Postgres, DNS tracing
- Open source
2. Cilium
- Network observability + security policy enforcement
- Replaces kube-proxy
- Hubble UI for service-to-service traffic visualization
3. Falco
- Runtime security detection
- "Alert if a process inside a container spawns a shell"
- Writes rules in YAML
4. Parca
- Continuous profiling via eBPF
- See CPU flame graphs across your entire fleet
- Identify the most expensive code paths
5. Tracee
- Security-focused eBPF tracing
- Detects privilege escalations, cryptojacking, suspicious syscalls
The Tradeoffs
Pros:
- Zero app code changes
- Near-zero overhead (kernel-level efficiency)
- Unified view across languages (Go, Python, Java, Rust, all seen the same way)
- No agent lifecycle to manage
Cons:
- Requires Linux 4.14+ (5.0+ preferred)
- Steep learning curve for custom probes
- Limited visibility into in-process logic (you see syscalls, not business logic)
- eBPF verifier rejects programs for subtle reasons
When eBPF Shines
- Network debugging: "Why is service A slow to reach service B?"
- Security auditing: "What containers are making unexpected syscalls?"
- Performance profiling: "Where is the cluster CPU time actually going?"
- Incident forensics: "Reconstruct the syscall timeline during the outage"
When eBPF Is Wrong
- Business logic observability you still need OpenTelemetry for spans
- Application errors your logs and exception tracking still matter
- Multi-region correlation eBPF is node-local
Use eBPF for infrastructure and network. Use OpenTelemetry for application logic. They complement each other.
Getting Started
- Deploy Pixie in a dev cluster (1-line install)
- Open the UI, watch real-time HTTP traffic
- Try a bpftrace one-liner to trace a specific syscall
- Read the Cilium + Hubble docs
- Replace one agent-based tool with its eBPF equivalent
The future of observability is kernel-native. Agent-based tools will still exist, but the gap will keep shrinking.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)