Nabin Debnath

Posted on Nov 7

Zero-Code Observability: Using eBPF to Auto-Instrument Services with OpenTelemetry

#observability #ebpf #opentelemetry #devops

Instrumenting services for observability often means sprinkling tracing code across hundreds of files which is painful to maintain and easy to forget.
eBPF + OpenTelemetry (OTel): a powerful combination that hooks into your running processes and emits traces, metrics, and logs without touching application code.

In this post, you’ll learn how to:

Use an eBPF agent to automatically instrument apps
Export telemetry data through OpenTelemetry Collector
Visualize it with Grafana
Control overhead and
Roll it out safely in production

Why observability shouldn’t require rewriting code

Modern apps are stitched together from dozens of microservices. We push features daily, yet visibility into performance often lags.

You’ve probably heard: “We’ll add tracing later.” …and then it never happens.

Manual instrumentation with OpenTelemetry SDKs gives fine-grained control, but it comes with:

Code changes across many repos,
Version mismatches between SDKs,
Extra CI/CD validation.

Wouldn’t it be nice if the system could observe itself, automatically?

That’s what eBPF (extended Berkeley Packet Filter) delivers. It hooks into the Linux kernel, captures runtime events (like syscalls, network, and process activity), and forwards them all with low overhead. Combine that with OpenTelemetry, and you get a zero-code observability pipeline.

eBPF + OpenTelemetry in plain English

eBPF: Think of eBPF as a programmable microscope for the Linux kernel. It lets you attach tiny programs to events such as network packets or function calls and safely collect data in real-time.

OpenTelemetry: OpenTelemetry (OTel) is a vendor-neutral standard for generating and exporting traces, metrics, and logs. It’s supported by almost every major observability backend (Grafana, Datadog, AWS X-Ray, etc.).

An eBPF agent can auto-discover and instrument running services (HTTP, gRPC, database calls, etc.) and emit OTel-formatted data to your collector.

No SDKs. No code injection. Everything happens in runtime.

Setting up your environment

For today's demo, we’ll use a simple Node.js app and an eBPF agent (Grafana Beyla ) to demonstrate. You can adapt this for Java, Python, Go, etc.

Step 1: Create a minimal service

mkdir ebpf-otel-demo && cd $_
npm init -y
npm install express

index.js

const express = require("express");
const app = express();

app.get("/orders/:id", async (req, res) => {
  await new Promise(r => setTimeout(r, Math.random() * 200));
  res.json({ orderId: req.params.id, status: "OK" });
});

app.listen(3000, () => console.log("Service running on port 3000"));

Dockerfile

FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["node", "index.js"]

Build and run

docker build -t ebpf-otel-demo .
docker run -p 3000:3000 ebpf-otel-demo

Your API is now live at http://localhost:3000/orders/123.

Step 2: Install an eBPF agent
Install Beyla on the host or as a sidecar container. (Requires Linux kernel ≥ 5.8.)

sudo apt-get install linux-headers-$(uname -r)
curl -sSfL https://github.com/grafana/beyla/releases/latest/download/beyla-linux-amd64.tar.gz | tar xz
sudo mv beyla /usr/local/bin/

Step 3: Configure the agent
Create beyla-config.yml:

listen:
  interfaces: [eth0]
otlp:
  endpoint: "localhost:4317"
service:
  name: "orders-service"
instrumentation:
  language: "nodejs"

Run it:

sudo beyla run --config beyla-config.yml

The agent now attaches to your running container, intercepts HTTP calls, and sends spans to your OTel Collector.

Connect OpenTelemetry Collector

The collector acts as a bridge between producers (Beyla) and your observability backend (Grafana, Tempo, or Jaeger).

Create otel-collector-config.yml:

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  logging:
  otlp:
    endpoint: "tempo:4317"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, otlp]

Run the collector (in Docker for simplicity):

docker run --rm -p 4317:4317 -v $(pwd)/otel-collector-config.yml:/etc/otel/config.yml \
  otel/opentelemetry-collector:latest --config /etc/otel/config.yml

Visualize traces in Grafana

If you’re using Grafana Tempo + Loki + Grafana OSS:

docker run -d --name=grafana -p 3001:3000 grafana/grafana

Add Tempo as a data source and point it to your collector’s OTLP endpoint. Within seconds, you’ll see spans like:

{
  "traceId": "5dfb0e7c16b6f9c1",
  "spanId": "8aeb32afaa3e41d9",
  "name": "GET /orders/:id",
  "attributes": {
    "http.method": "GET",
    "http.status_code": 200,
    "service.name": "orders-service"
  },
  "duration_ms": 52.8
}

Behind the scenes: what eBPF is doing

eBPF attaches probes (kprobes/uprobes) to kernel and user-space events:

Socket reads/writes -> network latency
HTTP libraries -> method, route, status
Syscalls -> file I/O, DNS, etc.

The agent aggregates these into OTel spans, adds tags (service, method, latency), and exports them asynchronously which usually consuming < 1–2% CPU.

Here’s a simplified view:

Controlling overhead and noise

Auto-instrumentation is powerful, but it can produce a lot of data. Here’s how to keep it efficient:

Sampling
In beyla-config.yml:

sampling:
  probability: 0.2   # capture 20% of requests

Filtering
Capture only interesting routes:

filters:
  include_paths: ["/orders/*"]

Resource limits
Run the agent with limited CPU/memory:

sudo systemd-run --property=CPUQuota=20% beyla run ...

Security considerations

eBPF programs run with kernel privileges.
Always use signed binaries or build from source.
Test in staging first. Avoid root unless required.

Production Rollout Checklist

Test in staging with representative traffic.
Enable sampling (< or = 20 %) before full rollout.
Run the agent in restricted mode (non-root if possible).
Compare baseline latency before/after attach.
Use dashboards to monitor agent CPU/memory usage.

Why this approach matters

You can onboard dozens of services instantly, a huge win for teams with legacy stacks or microservice sprawl.

What is next?

Combine with Service Mesh: Use eBPF telemetry to enrich service-mesh metrics (Istio, Linkerd).
Join Logs + Traces: Since OTel supports logs too, you can correlate application logs with eBPF spans via trace IDs.
Build Compliance Dashboards: In regulated industries (finance, healthcare), eBPF traces create immutable audit trails of service interactions without leaking business data.

Common problems you may face

Kernel version too old: upgrade or use COS/Ubuntu 22+.
Container visibility: run agent on host or enable --privileged if sidecar fails to attach.
Over-collection: fine-tune filters.
Trace backend mismatch: ensure OTel Collector exporter matches your backend format (Tempo, Jaeger, Zipkin)

Wrapping up

You’ve now built an observability stack that requires zero code changes yet delivers full visibility.

Key takeaways:
✅ eBPF captures runtime events safely and efficiently.
✅ OpenTelemetry unifies data into a portable format.
✅ Together they let developers focus on features.

Start small, pick one service, attach an agent, visualize traces and scale gradually.
Once you see that first automatic trace appear in Grafana, you’ll realize: observability doesn’t need to slow you down.

Further reading
Grafana Beyla Docs
OpenTelemetry Collector
eBPF.io Guide
CNCF Observability Landscape