DEV Community

SoftwareDevs mvpfactory.io
SoftwareDevs mvpfactory.io

Posted on • Originally published at mvpfactory.io

eBPF-Based Observability for Kubernetes Sidecars You Actually Understand

---
title: "eBPF Observability That Replaced Our $4K/Month APM"
published: true
description: "Build an eBPF-based observability pipeline for Kubernetes  per-pod HTTP latency histograms and TCP retransmit tracking with zero sidecars, zero code changes."
tags: kubernetes, devops, cloud, architecture
canonical_url: https://blog.mvpfactory.co/ebpf-observability-replaced-4k-month-apm
---

## What We're Building

Let me show you how to replace sidecar-based service mesh observability (and expensive APM licensing) with an eBPF pipeline using BPF CO-RE portable probes. By the end, you'll have a clear blueprint for feeding per-pod HTTP latency histograms and TCP retransmit metrics into Prometheus/Grafana — kernel-level visibility with no application code changes, a fraction of the memory footprint of Istio sidecars, and a monitoring bill that drops from ~$4K/month to infrastructure you already own.

## Prerequisites

- A Kubernetes cluster with BTF-enabled kernels (5.8+) — GKE, EKS with AL2023, and AKS meet this today
- Familiarity with Prometheus and Grafana
- Basic understanding of how Linux syscalls work
- `libbpf` or `bpf2go` (Go) for compiling probes

## Step 1: Understand the Resource Tax You're Paying

Before writing any code, here is the gotcha that will save you hours of premature optimization debates. Look at these real numbers:

| Metric | Istio sidecar (Envoy) | Linkerd sidecar | eBPF DaemonSet |
|---|---|---|---|
| Memory per pod | 50–100 MB | 20–30 MB | 0 (per-node: ~40 MB) |
| CPU overhead per pod | 1–3% added latency | <1% added latency | Negligible (kernel-space) |
| Deployment model | Per-pod sidecar | Per-pod sidecar | Per-node DaemonSet |
| 200 pods (total memory) | ~10–20 GB | ~4–6 GB | ~600 MB (15-node cluster) |

Sidecar models multiply overhead by **pod count**. eBPF multiplies by **node count**. At startup scale — dozens of nodes, hundreds of pods — that difference pays for an engineer.

## Step 2: Build Portable Probes with BPF CO-RE

The docs don't mention this, but before BPF CO-RE (Compile Once, Run Everywhere), eBPF programs needed kernel headers matched to each node's exact kernel version. In managed Kubernetes where node pools auto-update, that was a non-starter.

CO-RE uses BTF (BPF Type Format) type information embedded in modern kernels to relocate struct field accesses at load time. Your probe binary compiled on a CI machine runs on any BTF-enabled node without recompilation.

Here is the minimal setup to get TCP retransmit tracking working:

Enter fullscreen mode Exit fullscreen mode


c
SEC("tracepoint/tcp/tcp_retransmit_skb")
int trace_tcp_retransmit(struct trace_event_raw_tcp_event_sk_skb *ctx)
{
struct sock *sk = (struct sock *)ctx->skaddr;
u16 dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
u32 daddr = BPF_CORE_READ(sk, __sk_common.skc_daddr);

struct retransmit_event evt = {
    .dport = bpf_ntohs(dport),
    .daddr = daddr,
    .timestamp = bpf_ktime_get_ns(),
};
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
return 0;
Enter fullscreen mode Exit fullscreen mode

}


This fires in kernel space on every TCP retransmit — zero userspace overhead until the event buffer is read. You correlate the destination address to pod IPs using the Kubernetes API to label metrics per service.

## Step 3: Per-Pod HTTP Latency Without a Proxy

For HTTP latency histograms, attach uprobes to the `accept` and `read`/`write` syscall boundaries, then parse enough of the request line in-kernel to extract the HTTP method and status code. Tools like Kepler, Pixie (now open-sourced as part of the CNCF), and Cilium's Hubble take this approach to varying degrees.

Your userspace agent running as a DaemonSet aggregates these into Prometheus histograms:

Enter fullscreen mode Exit fullscreen mode


prometheus
http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.05"} 14210
http_request_duration_seconds_bucket{pod="api-server-7b4f",method="GET",status="200",le="0.1"} 15002


No instrumentation libraries. No language-specific agents. No application restarts. This works for Go, Rust, Python, Node — anything making syscalls, which is everything.

## Step 4: Compare the Real Costs

| Solution | Monthly cost (50-node cluster) | What you get |
|---|---|---|
| Commercial APM (per-host) | $3,000–5,000+ | Full tracing, dashboards, alerting, support |
| Istio + Prometheus/Grafana | ~$0 (licensing) + sidecar CPU/mem | L7 metrics, mTLS, traffic management |
| eBPF + Prometheus/Grafana | ~$0 (licensing) + minimal overhead | L4/L7 metrics, retransmit tracking, no sidecars |

For a startup watching burn rate, we picked eBPF without much debate.

## Gotchas

Let me show you a pattern I use in every project — documenting the blind spots before they bite you:

- **No distributed tracing out of the box.** eBPF sees network calls, not trace context headers. You still need OpenTelemetry SDKs or header propagation for cross-service trace IDs.
- **Encrypted payloads are opaque.** If services use mTLS (and they should), eBPF at the socket layer sees ciphertext. You need uprobes at the TLS library level (e.g., OpenSSL's `SSL_read`/`SSL_write`), which works but breaks across library versions. We've been bitten by this after routine base image updates.
- **Kernel version floor.** BTF support requires kernel 5.8+. Most managed Kubernetes offerings meet this today, but verify before committing.

## Conclusion

If I were starting today, I'd begin with just one probe: TCP retransmit tracking. Retransmits directly correlate to user-perceived latency spikes between services, the tracepoint is stable across kernel versions, and you can deploy it in an afternoon. It was the single probe that convinced our team this approach was worth investing in.

Use BPF CO-RE from the beginning — don't build kernel-version-specific probes. Target BTF-enabled kernels and compile once using `libbpf` or `bpf2go`, distributing as a container image. Keep OpenTelemetry for tracing and use eBPF for metrics. They solve different problems: eBPF handles aggregate network metrics with zero code changes; OTel handles request-scoped distributed traces. We run both and pay for neither.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)