Building a Production eBPF Observability & Security Stack for Kubernetes in 2026
Understanding what's happening inside a production Kubernetes cluster running thousands of containers remains one of the hardest operational challenges. Attaching sidecar proxies increases resource overhead, and embedding SDKs requires application code changes. eBPF (extended Berkeley Packet Filter) solves this problem at the kernel level. According to the 2026 CNCF Observability Technical Advisory Group (TAG) survey, 67% of teams running Kubernetes at scale have already adopted at least one eBPF-based observability tool in production.
This guide covers everything from the latest announcements at KubeCon EU 2026 to building a complete production observability and security stack with Cilium, Tetragon, Grafana Beyla, and the newly launched OpenTelemetry eBPF Instrumentation (OBI).
Why eBPF Became the Standard for Kubernetes Observability
Let's start by comparing traditional observability approaches with the eBPF-based approach. The key insight is: "Collect telemetry directly from the kernel without any code changes."
| Comparison | Traditional (SDK/Sidecar) | eBPF Approach |
|---|---|---|
| Instrumentation | Embed SDK in app code or deploy sidecar proxy | eBPF program in Linux kernel collects automatically |
| Code Changes | Required (import, init, span creation) | Not required (zero-code instrumentation) |
| CPU Overhead | 3~8% per Pod (sidecar) | Less than 1% per node |
| Memory Overhead | 50~120MB per sidecar | Single DaemonSet per node (40~80MB) |
| Restart Required | Pod redeployment needed for SDK changes | Applied immediately without Pod restart |
| Language Support | Separate SDK per language (Java, Python, Go, etc.) | Kernel-level — supports all languages |
| Collection Depth | Application-level spans/metrics | Network flows, syscalls, file I/O, process execution |
| Security Observability | Requires separate security agent | Unified security event collection via same eBPF programs |
An eBPF program runs just once per node and collects telemetry for every Pod on that node. Even with 100 Pods, you need a single DaemonSet instead of 100 sidecars. This is why eBPF has an overwhelming advantage at scale.
2026 Production eBPF Stack Architecture
The most battle-tested eBPF production stack in 2026 consists of four core components, each with clearly separated responsibilities. All are CNCF projects or part of the CNCF ecosystem.
As shown in the diagram above, Cilium/Hubble handles the network layer, Tetragon handles the security layer, OBI/Beyla handles application tracing, and all telemetry is delivered to backends (Grafana, Prometheus, Tempo) through the OpenTelemetry Collector.
Kernel Version Requirements
Choosing the right kernel version is critical for deploying the eBPF stack in production. The minimum requirement is 5.10 LTS, and the recommended version for 2026 production use is 6.1+.
| Distribution / OS | Default Kernel | CO-RE Support | Production Readiness |
|---|---|---|---|
| Ubuntu 24.04 LTS | 6.8 | Yes | Excellent |
| Amazon Linux 2023 | 6.1 | Yes | Excellent |
| Bottlerocket (AWS) | 6.1 | Yes | Excellent |
| Container-Optimized OS (GKE) | 6.1 | Yes | Excellent |
| Ubuntu 22.04 LTS | 5.15 | Yes | Good |
| RHEL 9 / Rocky 9 | 5.14 | Yes | Good |
| Amazon Linux 2 | 5.10 | Partial | Fair |
Tip: Using kernel 6.1+ with CO-RE (Compile Once, Run Everywhere) support means you can deploy eBPF programs without recompiling on each node. For EKS, choose Amazon Linux 2023 or Bottlerocket AMI. For GKE, use Container-Optimized OS.
Step 1: Cilium + Hubble — Network Observability
Cilium is an eBPF-based CNI (Container Network Interface) that replaces traditional iptables with L3/L4/L7 network policies and observability capabilities. Hubble is the network observability layer on top of Cilium, providing service map visualization and flow logs.
Installing Cilium (Helm)
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz
# Install Cilium + Hubble via Helm
helm repo add cilium https://helm.cilium.io/
helm repo update
helm install cilium cilium/cilium --version 1.16.5 \
--namespace kube-system \
--set hubble.enabled=true \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.metrics.enableOpenMetrics=true \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
--set prometheus.enabled=true \
--set operator.prometheus.enabled=true \
--set kubeProxyReplacement=true
Verifying Hubble Network Flow Observation
# Check Cilium status
cilium status --wait
# Observe real-time flows with Hubble CLI
hubble observe --namespace default --follow
# Filter traffic between specific services
hubble observe \
--from-namespace production \
--to-namespace production \
--to-label app=api-gateway \
--protocol TCP \
--verdict FORWARDED
# Filter by HTTP response code (L7 observation)
hubble observe --http-status 500 --namespace production
# Port-forward to Hubble UI for service map visualization
kubectl port-forward -n kube-system svc/hubble-ui 12000:80
Once you access the Hubble UI, you can see real-time service maps of all inter-service traffic in the cluster. L7 protocol-level (HTTP, gRPC, Kafka) request/response metrics are also collected automatically.
Cilium Network Policy Example
# cilium-network-policy.yaml
# L7 HTTP policy from API Gateway to backend services
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: api-gateway-to-backend
namespace: production
spec:
endpointSelector:
matchLabels:
app: backend-api
ingress:
- fromEndpoints:
- matchLabels:
app: api-gateway
toPorts:
- ports:
- port: "8080"
protocol: TCP
rules:
http:
- method: "GET"
path: "/api/v1/.*"
- method: "POST"
path: "/api/v1/.*"
headers:
- 'Content-Type: application/json'
Step 2: Tetragon — Runtime Security Observability
Tetragon is a Cilium sub-project and CNCF project that uses eBPF to observe runtime security events directly from the kernel and enforce policies. It monitors process execution, file access, network connections, and system calls in real time, with the ability to block threats at the kernel level instantly.
Installing Tetragon
# Install Tetragon via Helm
helm repo add cilium https://helm.cilium.io
helm repo update
helm install tetragon cilium/tetragon \
--namespace kube-system \
--set tetragon.grpc.address="localhost:54321" \
--set tetragon.exportFilename="/var/run/cilium/tetragon/tetragon.log" \
--set tetragon.enableProcessCred=true \
--set tetragon.enableProcessNs=true
# Install tetra CLI (for event observation)
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz
rm tetra-${GOOS}-${GOARCH}.tar.gz
TracingPolicy: Detecting Sensitive File Access
# tracing-policy-sensitive-files.yaml
# Detect and block access to sensitive files like /etc/shadow, /etc/passwd
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: sensitive-file-access
spec:
kprobes:
- call: "fd_install"
syscall: false
args:
- index: 0
type: int
- index: 1
type: "file"
selectors:
- matchArgs:
- index: 1
operator: "Prefix"
values:
- "/etc/shadow"
- "/etc/passwd"
- "/etc/kubernetes/pki"
- "/var/run/secrets/kubernetes.io"
matchActions:
- action: Sigkill # Immediately terminate the process
- action: Post # Send event log
TracingPolicy: Detecting Privilege Escalation
# tracing-policy-privilege-escalation.yaml
# Detect privilege escalation attempts inside containers
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: privilege-escalation-detect
spec:
kprobes:
- call: "__x64_sys_setuid"
syscall: true
args:
- index: 0
type: int
selectors:
- matchArgs:
- index: 0
operator: "Equal"
values:
- "0" # Attempt to change to UID 0 (root)
matchNamespaces:
- namespace: Pid # Container PID namespace
operator: NotIn
values:
- "host_init_pid" # Not host init process
matchActions:
- action: Sigkill
- action: Post
Real-time Security Event Observation
# Observe real-time process execution events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact
# Example output:
# process default/nginx-7d8b49557c-x2k9p /bin/sh -c "cat /etc/shadow"
# exit default/nginx-7d8b49557c-x2k9p /bin/sh -c "cat /etc/shadow" SIGKILL
# -> /etc/shadow access attempt blocked immediately
# Detailed event output in JSON format
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o json | jq '.process_exec | {pod: .process.pod.name, binary: .process.binary, args: .process.arguments}'
# Filter events by specific namespace
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact --namespace production
Step 3: OBI / Grafana Beyla — Zero-Code APM Tracing
OpenTelemetry eBPF Instrumentation (OBI) is the zero-code observability solution that Splunk announced in beta at KubeCon EU 2026. Originally developed by Grafana Labs as Beyla, it was donated to the OpenTelemetry project and is now co-developed by Grafana Labs, Splunk, Coralogix, Odigos, and other vendors.
OBI monitors network traffic using eBPF to automatically generate distributed traces and RED metrics (Rate, Errors, Duration). It supports all major languages without code modifications, including Go, Java, Python, Node.js, .NET, Ruby, C/C++, and Rust.
Deploying Beyla DaemonSet
# beyla-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: beyla
namespace: observability
labels:
app: beyla
spec:
selector:
matchLabels:
app: beyla
template:
metadata:
labels:
app: beyla
spec:
serviceAccountName: beyla
hostPID: true # eBPF needs access to host processes
hostNetwork: true # Required for network traffic capture
containers:
- name: beyla
image: grafana/beyla:1.9
securityContext:
privileged: true # Required for loading eBPF programs
env:
- name: BEYLA_OPEN_PORT
value: "80,443,3000,8080,8443,9090" # Ports to observe
- name: BEYLA_SERVICE_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector.observability:4318"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
- name: BEYLA_KUBE_METADATA_ENABLE
value: "true" # Auto-tag with Pod/Service metadata
- name: BEYLA_TRACE_PRINTER
value: "disabled" # Disable stdout output in production
volumeMounts:
- name: sys-kernel
mountPath: /sys/kernel
readOnly: true
volumes:
- name: sys-kernel
hostPath:
path: /sys/kernel
Advanced Beyla Configuration (ConfigMap)
# beyla-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: beyla-config
namespace: observability
data:
beyla-config.yml: |
# Network-based auto-discovery
discovery:
services:
- k8s_namespace: production
k8s_deployment_name: ".*"
- k8s_namespace: staging
k8s_deployment_name: ".*"
# RED metrics configuration
routes:
unmatched: heuristic # Automatic URL pattern grouping
patterns:
- /api/v1/users/{id}
- /api/v1/courses/{id}
- /api/v1/courses/{id}/lessons/{lessonId}
# Trace sampling (production cost savings)
sampler:
type: parentbased_traceidratio
ratio: 0.1 # 10% sampling
# Metrics export
otel_metrics_export:
endpoint: http://otel-collector.observability:4318
protocol: http/protobuf
interval: 15s
# Traces export
otel_traces_export:
endpoint: http://otel-collector.observability:4318
protocol: http/protobuf
Step 4: OpenTelemetry Collector — Unified Telemetry Hub
All telemetry collected from eBPF components is delivered to final backends through the OpenTelemetry Collector. The Collector handles data filtering, transformation, and routing, so you don't need to change your collection infrastructure when swapping backends.
# otel-collector-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: observability
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Hubble metrics collection (Prometheus scrape)
prometheus:
config:
scrape_configs:
- job_name: hubble
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_k8s_app]
regex: cilium
action: keep
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 5s
limit_mib: 512
spike_limit_mib: 128
k8sattributes:
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.pod.name
- k8s.node.name
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
exporters:
# Prometheus (metrics)
prometheusremotewrite:
endpoint: "http://mimir.observability:9009/api/v1/push"
tls:
insecure: true
# Tempo (traces)
otlp/tempo:
endpoint: "tempo.observability:4317"
tls:
insecure: true
# Loki (logs)
loki:
endpoint: "http://loki.observability:3100/loki/api/v1/push"
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, k8sattributes, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, k8sattributes, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
Troubleshooting: Common Issues and Solutions
Issue 1: eBPF Program Loading Failure
The most common issue is eBPF program loading failure due to kernel version or security settings.
# Symptom: Beyla/Tetragon Pod in CrashLoopBackOff
# Diagnose the cause
kubectl logs -n kube-system ds/tetragon -c tetragon | grep -i "bpf\|error\|failed"
# Check kernel BTF (BPF Type Format) support
ls /sys/kernel/btf/vmlinux
# If file doesn't exist -> BTF not supported -> Kernel upgrade needed
# Check eBPF capabilities
bpftool feature probe kernel | grep -E "map_type|prog_type|attach_type"
# Fix: Upgrade node kernel (EKS example)
# Change managed node group AMI to Amazon Linux 2023
aws eks update-nodegroup-config \
--cluster-name my-cluster \
--nodegroup-name my-nodegroup \
--launch-template name=my-template,version=2 # Use AL2023 AMI
Issue 2: Hubble Metrics Not Appearing in Prometheus
# Check Hubble metrics activation status
cilium config view | grep hubble
# Verify Hubble relay is healthy
cilium hubble port-forward &
hubble status
# Check Prometheus ServiceMonitor
kubectl get servicemonitor -n monitoring -l app=cilium
# Manually test metrics endpoint
kubectl exec -n kube-system ds/cilium -- \
curl -s http://localhost:9965/metrics | head -20
# Fix: Explicitly enable Prometheus integration in Helm values
helm upgrade cilium cilium/cilium -n kube-system \
--set hubble.metrics.enableOpenMetrics=true \
--set prometheus.enabled=true \
--set prometheus.serviceMonitor.enabled=true
Issue 3: Beyla Not Capturing Traffic for Specific Services
# Check Beyla discovery logs
kubectl logs -n observability ds/beyla | grep -i "discover\|instrument"
# Problem: TLS traffic cannot be decrypted by default
# Fix: Enable uprobe for SSL libraries (Go/Node.js, etc.)
# Add to beyla-config.yml:
# ssl:
# enabled: true
# Problem: Service using non-standard ports
# Fix: Add the ports to BEYLA_OPEN_PORT environment variable
kubectl set env ds/beyla -n observability \
BEYLA_OPEN_PORT="80,443,3000,5432,6379,8080,8443,9090,27017"
Conclusion: eBPF Stack Adoption Roadmap
The eBPF-based observability and security stack is no longer experimental technology in 2026. With Splunk officially announcing the OBI beta at KubeCon EU 2026 and Grafana Beyla being donated to the OpenTelemetry project, vendor-neutral standardization is advancing rapidly.
Here's the recommended roadmap for production adoption:
- Week 1-2: Verify and upgrade kernel version (6.1+). Install Cilium + Hubble on staging cluster. Migrate from existing CNI (Calico/Flannel).
- Week 3-4: Deploy Tetragon and apply basic TracingPolicies (sensitive file access, privilege escalation detection). Benchmark performance against Falco.
- Week 5-6: Deploy Beyla/OBI DaemonSet. Run in parallel with existing SDK-based instrumentation to verify data consistency. Adjust trace sampling ratio.
- Week 7-8: Configure OTel Collector pipeline. Build Grafana dashboards. Set up alerting rules. Gradually roll out to production.
Future Outlook: In the second half of 2026, AI-driven threat detection applying ML to Hubble network flow data is planned. Additionally, the new tetragon-python SDK will allow writing eBPF security policies in Python that transpile to bytecode. eBPF is breaking down the boundaries between observability and security, evolving into a unified kernel-level observation platform.

Top comments (0)