daniel jeong

Posted on Mar 31 • Edited on Apr 1 • Originally published at manoit.co.kr

Building a Production eBPF Observability & Security Stack for Kubernetes in 2026

#kubernetes #devops #ebpf #security

Building a Production eBPF Observability & Security Stack for Kubernetes in 2026

Understanding what's happening inside a production Kubernetes cluster running thousands of containers remains one of the hardest operational challenges. Attaching sidecar proxies increases resource overhead, and embedding SDKs requires application code changes. eBPF (extended Berkeley Packet Filter) solves this problem at the kernel level. According to the 2026 CNCF Observability Technical Advisory Group (TAG) survey, 67% of teams running Kubernetes at scale have already adopted at least one eBPF-based observability tool in production.

This guide covers everything from the latest announcements at KubeCon EU 2026 to building a complete production observability and security stack with Cilium, Tetragon, Grafana Beyla, and the newly launched OpenTelemetry eBPF Instrumentation (OBI).

Why eBPF Became the Standard for Kubernetes Observability

Let's start by comparing traditional observability approaches with the eBPF-based approach. The key insight is: "Collect telemetry directly from the kernel without any code changes."

Comparison	Traditional (SDK/Sidecar)	eBPF Approach
Instrumentation	Embed SDK in app code or deploy sidecar proxy	eBPF program in Linux kernel collects automatically
Code Changes	Required (import, init, span creation)	Not required (zero-code instrumentation)
CPU Overhead	3~8% per Pod (sidecar)	Less than 1% per node
Memory Overhead	50~120MB per sidecar	Single DaemonSet per node (40~80MB)
Restart Required	Pod redeployment needed for SDK changes	Applied immediately without Pod restart
Language Support	Separate SDK per language (Java, Python, Go, etc.)	Kernel-level — supports all languages
Collection Depth	Application-level spans/metrics	Network flows, syscalls, file I/O, process execution
Security Observability	Requires separate security agent	Unified security event collection via same eBPF programs

An eBPF program runs just once per node and collects telemetry for every Pod on that node. Even with 100 Pods, you need a single DaemonSet instead of 100 sidecars. This is why eBPF has an overwhelming advantage at scale.

2026 Production eBPF Stack Architecture

The most battle-tested eBPF production stack in 2026 consists of four core components, each with clearly separated responsibilities. All are CNCF projects or part of the CNCF ecosystem.

As shown in the diagram above, Cilium/Hubble handles the network layer, Tetragon handles the security layer, OBI/Beyla handles application tracing, and all telemetry is delivered to backends (Grafana, Prometheus, Tempo) through the OpenTelemetry Collector.

Kernel Version Requirements

Choosing the right kernel version is critical for deploying the eBPF stack in production. The minimum requirement is 5.10 LTS, and the recommended version for 2026 production use is 6.1+.

Distribution / OS	Default Kernel	CO-RE Support	Production Readiness
Ubuntu 24.04 LTS	6.8	Yes	Excellent
Amazon Linux 2023	6.1	Yes	Excellent
Bottlerocket (AWS)	6.1	Yes	Excellent
Container-Optimized OS (GKE)	6.1	Yes	Excellent
Ubuntu 22.04 LTS	5.15	Yes	Good
RHEL 9 / Rocky 9	5.14	Yes	Good
Amazon Linux 2	5.10	Partial	Fair

Tip: Using kernel 6.1+ with CO-RE (Compile Once, Run Everywhere) support means you can deploy eBPF programs without recompiling on each node. For EKS, choose Amazon Linux 2023 or Bottlerocket AMI. For GKE, use Container-Optimized OS.

Step 1: Cilium + Hubble — Network Observability

Cilium is an eBPF-based CNI (Container Network Interface) that replaces traditional iptables with L3/L4/L7 network policies and observability capabilities. Hubble is the network observability layer on top of Cilium, providing service map visualization and flow logs.

Installing Cilium (Helm)

# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --fail --remote-name-all \
  https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz

# Install Cilium + Hubble via Helm
helm repo add cilium https://helm.cilium.io/
helm repo update

helm install cilium cilium/cilium --version 1.16.5 \
  --namespace kube-system \
  --set hubble.enabled=true \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.metrics.enableOpenMetrics=true \
  --set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\,source_namespace\,source_workload\,destination_ip\,destination_namespace\,destination_workload\,traffic_direction}" \
  --set prometheus.enabled=true \
  --set operator.prometheus.enabled=true \
  --set kubeProxyReplacement=true

Verifying Hubble Network Flow Observation

# Check Cilium status
cilium status --wait

# Observe real-time flows with Hubble CLI
hubble observe --namespace default --follow

# Filter traffic between specific services
hubble observe \
  --from-namespace production \
  --to-namespace production \
  --to-label app=api-gateway \
  --protocol TCP \
  --verdict FORWARDED

# Filter by HTTP response code (L7 observation)
hubble observe --http-status 500 --namespace production

# Port-forward to Hubble UI for service map visualization
kubectl port-forward -n kube-system svc/hubble-ui 12000:80

Once you access the Hubble UI, you can see real-time service maps of all inter-service traffic in the cluster. L7 protocol-level (HTTP, gRPC, Kafka) request/response metrics are also collected automatically.

Cilium Network Policy Example

# cilium-network-policy.yaml
# L7 HTTP policy from API Gateway to backend services
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: api-gateway-to-backend
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: backend-api
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: api-gateway
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: "GET"
                path: "/api/v1/.*"
              - method: "POST"
                path: "/api/v1/.*"
                headers:
                  - 'Content-Type: application/json'

Step 2: Tetragon — Runtime Security Observability

Tetragon is a Cilium sub-project and CNCF project that uses eBPF to observe runtime security events directly from the kernel and enforce policies. It monitors process execution, file access, network connections, and system calls in real time, with the ability to block threats at the kernel level instantly.

Installing Tetragon

# Install Tetragon via Helm
helm repo add cilium https://helm.cilium.io
helm repo update

helm install tetragon cilium/tetragon \
  --namespace kube-system \
  --set tetragon.grpc.address="localhost:54321" \
  --set tetragon.exportFilename="/var/run/cilium/tetragon/tetragon.log" \
  --set tetragon.enableProcessCred=true \
  --set tetragon.enableProcessNs=true

# Install tetra CLI (for event observation)
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L --remote-name-all \
  https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz
sudo tar -C /usr/local/bin -xzvf tetra-${GOOS}-${GOARCH}.tar.gz
rm tetra-${GOOS}-${GOARCH}.tar.gz

TracingPolicy: Detecting Sensitive File Access

# tracing-policy-sensitive-files.yaml
# Detect and block access to sensitive files like /etc/shadow, /etc/passwd
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: sensitive-file-access
spec:
  kprobes:
    - call: "fd_install"
      syscall: false
      args:
        - index: 0
          type: int
        - index: 1
          type: "file"
      selectors:
        - matchArgs:
            - index: 1
              operator: "Prefix"
              values:
                - "/etc/shadow"
                - "/etc/passwd"
                - "/etc/kubernetes/pki"
                - "/var/run/secrets/kubernetes.io"
          matchActions:
            - action: Sigkill   # Immediately terminate the process
            - action: Post      # Send event log

TracingPolicy: Detecting Privilege Escalation

# tracing-policy-privilege-escalation.yaml
# Detect privilege escalation attempts inside containers
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: privilege-escalation-detect
spec:
  kprobes:
    - call: "__x64_sys_setuid"
      syscall: true
      args:
        - index: 0
          type: int
      selectors:
        - matchArgs:
            - index: 0
              operator: "Equal"
              values:
                - "0"   # Attempt to change to UID 0 (root)
          matchNamespaces:
            - namespace: Pid    # Container PID namespace
              operator: NotIn
              values:
                - "host_init_pid"  # Not host init process
          matchActions:
            - action: Sigkill
            - action: Post

Real-time Security Event Observation

# Observe real-time process execution events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact

# Example output:
# process default/nginx-7d8b49557c-x2k9p /bin/sh -c "cat /etc/shadow"
# exit    default/nginx-7d8b49557c-x2k9p /bin/sh -c "cat /etc/shadow" SIGKILL
# -> /etc/shadow access attempt blocked immediately

# Detailed event output in JSON format
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o json | jq '.process_exec | {pod: .process.pod.name, binary: .process.binary, args: .process.arguments}'

# Filter events by specific namespace
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact --namespace production

Step 3: OBI / Grafana Beyla — Zero-Code APM Tracing

OpenTelemetry eBPF Instrumentation (OBI) is the zero-code observability solution that Splunk announced in beta at KubeCon EU 2026. Originally developed by Grafana Labs as Beyla, it was donated to the OpenTelemetry project and is now co-developed by Grafana Labs, Splunk, Coralogix, Odigos, and other vendors.

OBI monitors network traffic using eBPF to automatically generate distributed traces and RED metrics (Rate, Errors, Duration). It supports all major languages without code modifications, including Go, Java, Python, Node.js, .NET, Ruby, C/C++, and Rust.

Deploying Beyla DaemonSet

# beyla-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: beyla
  namespace: observability
  labels:
    app: beyla
spec:
  selector:
    matchLabels:
      app: beyla
  template:
    metadata:
      labels:
        app: beyla
    spec:
      serviceAccountName: beyla
      hostPID: true           # eBPF needs access to host processes
      hostNetwork: true       # Required for network traffic capture
      containers:
        - name: beyla
          image: grafana/beyla:1.9
          securityContext:
            privileged: true    # Required for loading eBPF programs
          env:
            - name: BEYLA_OPEN_PORT
              value: "80,443,3000,8080,8443,9090"  # Ports to observe
            - name: BEYLA_SERVICE_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-collector.observability:4318"
            - name: OTEL_EXPORTER_OTLP_PROTOCOL
              value: "http/protobuf"
            - name: BEYLA_KUBE_METADATA_ENABLE
              value: "true"     # Auto-tag with Pod/Service metadata
            - name: BEYLA_TRACE_PRINTER
              value: "disabled"  # Disable stdout output in production
          volumeMounts:
            - name: sys-kernel
              mountPath: /sys/kernel
              readOnly: true
      volumes:
        - name: sys-kernel
          hostPath:
            path: /sys/kernel

Advanced Beyla Configuration (ConfigMap)

# beyla-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: beyla-config
  namespace: observability
data:
  beyla-config.yml: |
    # Network-based auto-discovery
    discovery:
      services:
        - k8s_namespace: production
          k8s_deployment_name: ".*"
        - k8s_namespace: staging
          k8s_deployment_name: ".*"

    # RED metrics configuration
    routes:
      unmatched: heuristic    # Automatic URL pattern grouping
      patterns:
        - /api/v1/users/{id}
        - /api/v1/courses/{id}
        - /api/v1/courses/{id}/lessons/{lessonId}

    # Trace sampling (production cost savings)
    sampler:
      type: parentbased_traceidratio
      ratio: 0.1              # 10% sampling

    # Metrics export
    otel_metrics_export:
      endpoint: http://otel-collector.observability:4318
      protocol: http/protobuf
      interval: 15s

    # Traces export
    otel_traces_export:
      endpoint: http://otel-collector.observability:4318
      protocol: http/protobuf

Step 4: OpenTelemetry Collector — Unified Telemetry Hub

All telemetry collected from eBPF components is delivered to final backends through the OpenTelemetry Collector. The Collector handles data filtering, transformation, and routing, so you don't need to change your collection infrastructure when swapping backends.

# otel-collector-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: observability
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

      # Hubble metrics collection (Prometheus scrape)
      prometheus:
        config:
          scrape_configs:
            - job_name: hubble
              kubernetes_sd_configs:
                - role: pod
              relabel_configs:
                - source_labels: [__meta_kubernetes_pod_label_k8s_app]
                  regex: cilium
                  action: keep

    processors:
      batch:
        timeout: 10s
        send_batch_size: 1024
      memory_limiter:
        check_interval: 5s
        limit_mib: 512
        spike_limit_mib: 128
      k8sattributes:
        extract:
          metadata:
            - k8s.namespace.name
            - k8s.deployment.name
            - k8s.pod.name
            - k8s.node.name
        pod_association:
          - sources:
              - from: resource_attribute
                name: k8s.pod.ip

    exporters:
      # Prometheus (metrics)
      prometheusremotewrite:
        endpoint: "http://mimir.observability:9009/api/v1/push"
        tls:
          insecure: true
      # Tempo (traces)
      otlp/tempo:
        endpoint: "tempo.observability:4317"
        tls:
          insecure: true
      # Loki (logs)
      loki:
        endpoint: "http://loki.observability:3100/loki/api/v1/push"

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, k8sattributes, batch]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp, prometheus]
          processors: [memory_limiter, k8sattributes, batch]
          exporters: [prometheusremotewrite]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [loki]

Troubleshooting: Common Issues and Solutions

Issue 1: eBPF Program Loading Failure

The most common issue is eBPF program loading failure due to kernel version or security settings.

# Symptom: Beyla/Tetragon Pod in CrashLoopBackOff
# Diagnose the cause
kubectl logs -n kube-system ds/tetragon -c tetragon | grep -i "bpf\|error\|failed"

# Check kernel BTF (BPF Type Format) support
ls /sys/kernel/btf/vmlinux
# If file doesn't exist -> BTF not supported -> Kernel upgrade needed

# Check eBPF capabilities
bpftool feature probe kernel | grep -E "map_type|prog_type|attach_type"

# Fix: Upgrade node kernel (EKS example)
# Change managed node group AMI to Amazon Linux 2023
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --launch-template name=my-template,version=2   # Use AL2023 AMI

Issue 2: Hubble Metrics Not Appearing in Prometheus

# Check Hubble metrics activation status
cilium config view | grep hubble

# Verify Hubble relay is healthy
cilium hubble port-forward &
hubble status

# Check Prometheus ServiceMonitor
kubectl get servicemonitor -n monitoring -l app=cilium

# Manually test metrics endpoint
kubectl exec -n kube-system ds/cilium -- \
  curl -s http://localhost:9965/metrics | head -20

# Fix: Explicitly enable Prometheus integration in Helm values
helm upgrade cilium cilium/cilium -n kube-system \
  --set hubble.metrics.enableOpenMetrics=true \
  --set prometheus.enabled=true \
  --set prometheus.serviceMonitor.enabled=true

Issue 3: Beyla Not Capturing Traffic for Specific Services

# Check Beyla discovery logs
kubectl logs -n observability ds/beyla | grep -i "discover\|instrument"

# Problem: TLS traffic cannot be decrypted by default
# Fix: Enable uprobe for SSL libraries (Go/Node.js, etc.)
# Add to beyla-config.yml:
#   ssl:
#     enabled: true

# Problem: Service using non-standard ports
# Fix: Add the ports to BEYLA_OPEN_PORT environment variable
kubectl set env ds/beyla -n observability \
  BEYLA_OPEN_PORT="80,443,3000,5432,6379,8080,8443,9090,27017"

Conclusion: eBPF Stack Adoption Roadmap

The eBPF-based observability and security stack is no longer experimental technology in 2026. With Splunk officially announcing the OBI beta at KubeCon EU 2026 and Grafana Beyla being donated to the OpenTelemetry project, vendor-neutral standardization is advancing rapidly.

Here's the recommended roadmap for production adoption:

Week 1-2: Verify and upgrade kernel version (6.1+). Install Cilium + Hubble on staging cluster. Migrate from existing CNI (Calico/Flannel).
Week 3-4: Deploy Tetragon and apply basic TracingPolicies (sensitive file access, privilege escalation detection). Benchmark performance against Falco.
Week 5-6: Deploy Beyla/OBI DaemonSet. Run in parallel with existing SDK-based instrumentation to verify data consistency. Adjust trace sampling ratio.
Week 7-8: Configure OTel Collector pipeline. Build Grafana dashboards. Set up alerting rules. Gradually roll out to production.

Future Outlook: In the second half of 2026, AI-driven threat detection applying ML to Hubble network flow data is planned. Additionally, the new tetragon-python SDK will allow writing eBPF security policies in Python that transpile to bytecode. eBPF is breaking down the boundaries between observability and security, evolving into a unified kernel-level observation platform.

DEV Community

Building a Production eBPF Observability & Security Stack for Kubernetes in 2026

Building a Production eBPF Observability & Security Stack for Kubernetes in 2026

Why eBPF Became the Standard for Kubernetes Observability

2026 Production eBPF Stack Architecture

Kernel Version Requirements

Step 1: Cilium + Hubble — Network Observability

Installing Cilium (Helm)

Verifying Hubble Network Flow Observation

Cilium Network Policy Example

Step 2: Tetragon — Runtime Security Observability

Installing Tetragon

TracingPolicy: Detecting Sensitive File Access

TracingPolicy: Detecting Privilege Escalation

Real-time Security Event Observation

Step 3: OBI / Grafana Beyla — Zero-Code APM Tracing

Deploying Beyla DaemonSet

Advanced Beyla Configuration (ConfigMap)

Step 4: OpenTelemetry Collector — Unified Telemetry Hub

Troubleshooting: Common Issues and Solutions

Issue 1: eBPF Program Loading Failure

Issue 2: Hubble Metrics Not Appearing in Prometheus

Issue 3: Beyla Not Capturing Traffic for Specific Services

Conclusion: eBPF Stack Adoption Roadmap

Top comments (0)