Sidecar proxies are dying. Not dramatically, not all at once, but the writing is on the wall. The shift to eBPF monitoring is already underway, and Datadog's aggressive investment in kernel-level observability is the clearest signal yet that the sidecar era is winding down.
I've spent the last two years watching platform engineering teams struggle with the operational weight of service mesh sidecars. The promise was elegant: drop an Envoy proxy next to every pod, get mutual TLS, traffic control, and observability for free. The reality was hundreds of sidecar containers consuming memory and CPU, adding latency to every request, and creating a debugging nightmare when things went sideways. eBPF offers a fundamentally different approach. And it's working.
What Is eBPF Monitoring and Why Does It Matter?
eBPF (extended Berkeley Packet Filter) lets you run sandboxed programs directly inside the Linux kernel without modifying kernel source code or loading kernel modules. Think of it as programmable hooks into the kernel's networking stack, system calls, and scheduler. As Brendan Gregg, author of BPF Performance Tools and now at OpenAI, famously put it: "eBPF does to Linux what JavaScript does to HTML." It makes the kernel programmable.
For monitoring, this changes everything in practice. Instead of intercepting traffic at the application layer through a sidecar proxy, eBPF programs observe packets, system calls, and socket events directly in kernel space. The Datadog Agent uses this to trace all network connections and build a comprehensive service dependency map without requiring any code instrumentation or sidecar injection.
The practical difference is enormous. eBPF programs run in a verified sandbox that the kernel guarantees won't crash your system. They attach to specific kernel events, collect data with near-zero overhead, and pass it to userspace for aggregation. No extra containers. No extra network hops. No YAML templating nightmares.
eBPF doesn't just make monitoring cheaper. It makes monitoring possible in places where sidecars were never practical.
How Sidecar Proxies Became the Problem They Solved
The sidecar model, popularized by Istio and Linkerd, was a genuine innovation. Inject a proxy container alongside every application pod, and suddenly you have observability, encryption, and traffic management without touching application code. For a while, it was the answer to everything.
Then reality hit at scale.
I've worked with Kubernetes clusters where sidecar proxies consumed 15-20% of total compute. Each Envoy sidecar sits in the hot path of every request. Independent benchmarks from the Istio and Linkerd communities have consistently shown that sidecar proxies add sub-millisecond to low-single-digit millisecond latency per hop, with the exact overhead varying based on configuration, traffic patterns, and payload sizes. In a deep microservices call chain with 8-10 hops, that tax compounds fast.
But the performance cost isn't even the worst part. The operational complexity is. Every sidecar is another container to schedule, monitor, and upgrade. Sidecar version mismatches cause silent failures. Debugging network issues means figuring out whether the problem is in your application, the sidecar, or the interaction between them. I've seen teams spend days chasing phantom latency spikes that turned out to be sidecar resource contention during garbage collection. Days.
Thomas Graf, CTO and Co-founder of Isovalent (the company behind Cilium), has been arguing for years that the sidecar model fundamentally conflicts with performance-sensitive workloads. Unlike sidecar proxies which add a network hop for every packet, eBPF processes data directly in the kernel path. No context switching between userspace proxy containers and the kernel networking stack.
If you've ever dealt with kernel-level performance issues in production databases, you already know how much the kernel layer matters. Same principle applies here: the closer you are to where the work actually happens, the less overhead you introduce.
How Datadog Is Using eBPF for Deep Observability
Datadog's eBPF journey is one of the most instructive case studies in the industry right now. Laurent Bernaille and Tabitha Sable, Staff Software Engineers at Datadog, detailed the technical challenges of adopting eBPF across thousands of nodes in their keynote at the eBPF & Cilium Community event.
Here's that keynote, which walks through their real-world challenges and solutions:
[YOUTUBE:6nlv_VCsjpQ|Our eBPF Journey at Datadog - Laurent Bernaille & Tabitha Sable, Datadog - Full Keynote]
The Datadog Agent leverages eBPF to capture system calls, network traffic, and kernel-level events. This gives you visibility that goes far deeper than application-level metrics ever could. Their Universal Service Monitoring feature uses eBPF to automatically detect and monitor every service running on a host, including ones with zero instrumentation. No agent library. No code changes. No sidecars.
At Datadog's scale, managing infrastructure across thousands of customer nodes, the traditional approach of deploying and maintaining sidecar proxies was a non-starter. eBPF gave them a way to collect rich telemetry data from a single agent running on each node rather than injecting proxies into every pod.
The architecture is straightforward: eBPF programs attach to kernel tracepoints and kprobes, capturing TCP connection events, DNS lookups, HTTP request metadata, and TLS handshake information. The Datadog Agent in userspace aggregates this data and ships it to their platform. One agent per node replaces what would have been hundreds of sidecar containers.
And this isn't just Datadog making this bet. Cilium is replacing kube-proxy with eBPF in production Kubernetes clusters at Google, Adobe, and Capital One. Grafana's Beyla project uses eBPF for auto-instrumentation. The entire observability industry is converging here.
Is eBPF Monitoring Secure?
One of the first questions I get from security-conscious platform teams is whether running programs in the kernel is safe. Fair question. Kernel modules have historically been a vector for instability and security vulnerabilities.
eBPF addresses this with a verification system that's actually well-designed. Before any eBPF program runs, the kernel's built-in verifier statically analyzes it to prove it will terminate (no infinite loops), won't access unauthorized memory, and won't crash the kernel. Programs that fail verification simply don't load. This is a completely different model from traditional kernel modules, which run with full kernel privileges and can bring down the entire system.
eBPF programs also run with restricted capabilities. They can read kernel data structures but can't arbitrarily write to them. They execute in a sandbox with strong isolation guarantees. This model is secure enough that major cloud providers run eBPF in production across their entire fleets.
Now, eBPF does require CAP_BPF (or root) privileges to load programs, which means your monitoring agent needs elevated permissions on each node. For teams already running DaemonSets with host-level access (which is most teams running any kind of node monitoring), this isn't a meaningful change in security posture. But you should understand what you're granting.
Similar to how supply chain attacks target trusted components in your software pipeline, the security of your eBPF programs depends on trusting the agent that loads them. Use signed, vendor-provided agents rather than arbitrary eBPF programs in production.
What Linux Kernel Version Do You Need?
This is the practical constraint that trips teams up. eBPF's capabilities depend heavily on your kernel version. The minimum viable kernel for serious eBPF monitoring is 4.15 (shipped with Ubuntu 18.04), but many advanced features require 5.8+.
Here's what you get at each level:
- Kernel 4.15+: Basic kprobes, socket filtering, and network tracing. Enough for connection-level monitoring.
- Kernel 5.3+: BTF (BPF Type Format) support, which makes eBPF programs portable across kernel versions without recompilation. This is a big deal for production.
- Kernel 5.8+: Ring buffer support, improved memory efficiency, and broader tracepoint access. This is where eBPF monitoring actually starts to shine.
- Kernel 5.15+ (LTS): The sweet spot for most production workloads. Full CO-RE (Compile Once, Run Everywhere) support, better security primitives.
If you're running a major managed Kubernetes service (EKS, GKE, AKS), you're almost certainly on a kernel that supports eBPF. Amazon Linux 2023 ships with kernel 6.1, GKE's Container-Optimized OS uses 5.15+, and AKS moved to 5.15 as the default. If you're on older infrastructure, this might be the push you need to finally upgrade.
Does eBPF Replace Service Mesh Entirely?
This is the question everyone asks, and the answer is: not yet, and maybe not ever.
eBPF excels at observability, network policy enforcement, and load balancing. Cilium has proven this at massive scale. But a full service mesh like Istio provides capabilities that eBPF alone doesn't replicate easily: complex traffic routing rules, circuit breaking with sophisticated retry policies, and protocol-aware request-level load balancing.
Here's the thing nobody's saying about service meshes, though: most teams never needed a full one in the first place. They needed observability and mutual TLS. eBPF gives you the observability piece with dramatically less overhead, and projects like Cilium's mutual authentication are closing the mTLS gap.
In my experience, the teams that benefit most from this shift are those running 50-500 microservices. Below that, you probably don't need either solution. Above that, you likely want a hybrid approach: eBPF for observability and network policy, with targeted sidecar injection only for services that genuinely need advanced traffic management.
If you're building distributed systems with complex workflow orchestration, you still need application-level observability and tracing. eBPF gives you the network-level picture. It doesn't replace your OpenTelemetry instrumentation for business logic tracing. The two are complementary.
The Kernel Is the New Control Plane
The trend line here is clear. Observability is moving from sidecar proxies to kernel programs. Networking is moving from userspace proxies to eBPF-based data planes. Security policy enforcement is moving from iptables to eBPF.
Datadog's investment in eBPF isn't just a product decision. It's a bet that the Linux kernel is the right abstraction layer for infrastructure observability. Every container, every pod, and every process on a node already passes through the kernel. Betting on that layer is hard to argue against.
My prediction: by the end of 2027, the default monitoring architecture for new Kubernetes deployments won't include sidecar proxies at all. eBPF-based agents running as DaemonSets will handle observability, and service mesh sidecars will be reserved for the small percentage of services that genuinely need protocol-level traffic manipulation.
If you're still planning a sidecar-based service mesh rollout, stop and ask yourself what you actually need. If the answer is observability and basic network policy, eBPF already does it better, cheaper, and with fewer moving parts. The kernel was always the right place to watch your network. We just didn't have the tools to do it safely until now.
Originally published on kunalganglani.com
Top comments (0)