OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends
If you're still debugging production issues with console.log, this guide is for you.
OpenTelemetry (OTel) has become the de-facto standard for observability in 2026. It's vendor-neutral, open-source, and supported by every major cloud provider and APM tool. Here's everything you need to know to implement it properly.
What is OpenTelemetry?
OpenTelemetry is a CNCF project that provides:
- Traces — distributed request flows across services
- Metrics — numeric measurements over time (latency, error rate, throughput)
- Logs — structured event records (now with context correlation)
The key insight: with OTel, all three signals share the same trace context, so you can jump from a slow metric → the trace → the log that explains why.
Why OpenTelemetry Matters in 2026
The observability landscape has shifted:
Before OTel: proprietary SDK → vendor lock-in → $$$
After OTel: OTel SDK → OTel Collector → any backend
You instrument once and can export to Grafana Cloud, Datadog, Honeycomb, Jaeger, or your own Prometheus stack — without changing your application code.
The Three Pillars (With Real Examples)
1. Traces
A trace represents the lifecycle of a single request:
HTTP Request
└── Auth Middleware (12ms)
└── Route Handler (245ms)
└── DB Query: SELECT users (180ms)
└── Redis Cache SET (8ms)
└── Email Service Call (45ms) [FAILED]
Without tracing, you'd see "request took 310ms" and have no idea where the time went.
2. Metrics
Metrics are aggregated over time:
-
http_request_duration_seconds(histogram) -
http_requests_total(counter) -
database_pool_connections_active(gauge)
3. Logs (Correlated)
The new superpower: logs linked to traces. When you see a slow trace, click → see all logs from that exact request context. No more searching by timestamp.
Getting Started: Node.js in 5 Minutes
npm install @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http \
@opentelemetry/exporter-prometheus
Create tracing.js (load this before your app):
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const { Resource } = require('@opentelemetry/resources');
const { ATTR_SERVICE_NAME } = require('@opentelemetry/semantic-conventions');
const sdk = new NodeSDK({
resource: new Resource({
[ATTR_SERVICE_NAME]: 'my-api',
'deployment.environment': process.env.NODE_ENV || 'development',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
}),
metricReader: new PrometheusExporter({ port: 9464 }),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
console.log('OpenTelemetry initialized');
process.on('SIGTERM', () => sdk.shutdown());
Start your app with:
node --require ./tracing.js server.js
That's it. You now have automatic instrumentation for Express, HTTP calls, database queries, and more — zero code changes to your app.
Manual Spans: When Auto-Instrumentation Isn't Enough
For business logic, add custom spans:
const { trace, context, SpanStatusCode } = require('@opentelemetry/api');
const tracer = trace.getTracer('my-service', '1.0.0');
async function processOrder(orderId) {
return tracer.startActiveSpan('processOrder', async (span) => {
try {
// Add context to the span
span.setAttributes({
'order.id': orderId,
'order.source': 'web',
});
const order = await db.getOrder(orderId);
span.setAttribute('order.total', order.total);
span.setAttribute('order.items_count', order.items.length);
await validateInventory(order); // This creates a child span automatically
await chargePayment(order); // Same here
span.setStatus({ code: SpanStatusCode.OK });
return order;
} catch (error) {
span.recordException(error);
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
throw error;
} finally {
span.end();
}
});
}
The OTel Collector: Your Observability Router
Never export directly from your app to a vendor. Use the Collector:
App → OTel Collector → Jaeger (traces)
→ Prometheus (metrics)
→ Loki (logs)
→ Datadog (all three, if you want)
collector-config.yaml:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 10s
memory_limiter:
limit_mib: 256
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger, logging]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
Free Local Stack with Docker Compose
version: '3.8'
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./collector-config.yaml:/etc/otel/config.yaml
command: ["--config=/etc/otel/config.yaml"]
ports:
- "4317:4317" # gRPC
- "4318:4318" # HTTP
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "14250:14250" # gRPC
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
Visit localhost:16686 for traces, localhost:3001 for Grafana dashboards.
Python Example (FastAPI)
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
def configure_telemetry(app):
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://otel-collector:4318/v1/traces")
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
FastAPIInstrumentor.instrument_app(app)
HTTPXClientInstrumentor().instrument()
SQLAlchemyInstrumentor().instrument()
Key Metrics to Track (SRE Golden Signals)
| Signal | Metric | Alert Threshold |
|---|---|---|
| Latency | http_request_duration_p99 |
> 500ms |
| Traffic |
http_requests_total rate |
Drop > 20% |
| Errors | http_errors_total / http_requests_total |
> 1% |
| Saturation | system_cpu_utilization |
> 80% |
Sampling Strategies: Don't Trace Everything
At scale, tracing 100% of requests is expensive:
const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');
// Sample 10% in production, always sample errors
const sampler = new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(0.1),
});
For critical paths (payments, auth), use always-on. For health checks, use never.
Common Pitfalls
1. Forgetting context propagation
When making HTTP calls between services, propagate the trace context:
// With auto-instrumentation, this is automatic
// But for manual HTTP clients:
context.with(trace.setSpan(context.active(), span), () => {
fetch('http://other-service/api/data'); // Context injected automatically
});
2. Over-instrumenting
Don't add a span for every function call. Instrument at meaningful boundaries: HTTP requests, DB queries, cache operations, external API calls.
3. Missing resource attributes
Always set service.name, service.version, and deployment.environment. These are crucial for filtering in production.
OTel in 2026: What's New
- Profiles signal is now stable — CPU/memory profiling correlated with traces
- OTel Arrow — 80% compression for high-volume telemetry export
- eBPF auto-instrumentation — zero-code instrumentation at kernel level
- AI observability — LLM token tracking, model latency, prompt/response logging built into the spec
Conclusion
OpenTelemetry is no longer optional for production backends. It's the plumbing that lets you:
- Debug production issues in minutes instead of hours
- Understand the true cost of every feature
- Run SRE practices without a dedicated team
Start with auto-instrumentation, add manual spans for business logic, run the Collector locally. You'll never go back to console.log debugging.
Building production-ready backends? Check out our Node.js REST API Boilerplate Pack — 5 production templates with observability pre-configured. And our Freelancer OS for managing your dev projects.
Top comments (0)