Otto

Posted on Mar 26

OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends

#opentelemetry #observability #devops #backend

OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends

If you're still debugging production issues with console.log, this guide is for you.

OpenTelemetry (OTel) has become the de-facto standard for observability in 2026. It's vendor-neutral, open-source, and supported by every major cloud provider and APM tool. Here's everything you need to know to implement it properly.

What is OpenTelemetry?

OpenTelemetry is a CNCF project that provides:

Traces — distributed request flows across services
Metrics — numeric measurements over time (latency, error rate, throughput)
Logs — structured event records (now with context correlation)

The key insight: with OTel, all three signals share the same trace context, so you can jump from a slow metric → the trace → the log that explains why.

Why OpenTelemetry Matters in 2026

The observability landscape has shifted:

Before OTel:     proprietary SDK → vendor lock-in → $$$
After OTel:      OTel SDK → OTel Collector → any backend

You instrument once and can export to Grafana Cloud, Datadog, Honeycomb, Jaeger, or your own Prometheus stack — without changing your application code.

The Three Pillars (With Real Examples)

1. Traces

A trace represents the lifecycle of a single request:

HTTP Request
  └── Auth Middleware (12ms)
  └── Route Handler (245ms)
        └── DB Query: SELECT users (180ms)
        └── Redis Cache SET (8ms)
        └── Email Service Call (45ms) [FAILED]

Without tracing, you'd see "request took 310ms" and have no idea where the time went.

2. Metrics

Metrics are aggregated over time:

http_request_duration_seconds (histogram)
http_requests_total (counter)
database_pool_connections_active (gauge)

3. Logs (Correlated)

The new superpower: logs linked to traces. When you see a slow trace, click → see all logs from that exact request context. No more searching by timestamp.

Getting Started: Node.js in 5 Minutes

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-prometheus

Create tracing.js (load this before your app):

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const { Resource } = require('@opentelemetry/resources');
const { ATTR_SERVICE_NAME } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'my-api',
    'deployment.environment': process.env.NODE_ENV || 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PrometheusExporter({ port: 9464 }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
console.log('OpenTelemetry initialized');

process.on('SIGTERM', () => sdk.shutdown());

Start your app with:

node --require ./tracing.js server.js

That's it. You now have automatic instrumentation for Express, HTTP calls, database queries, and more — zero code changes to your app.

Manual Spans: When Auto-Instrumentation Isn't Enough

For business logic, add custom spans:

const { trace, context, SpanStatusCode } = require('@opentelemetry/api');

const tracer = trace.getTracer('my-service', '1.0.0');

async function processOrder(orderId) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      // Add context to the span
      span.setAttributes({
        'order.id': orderId,
        'order.source': 'web',
      });

      const order = await db.getOrder(orderId);
      span.setAttribute('order.total', order.total);
      span.setAttribute('order.items_count', order.items.length);

      await validateInventory(order); // This creates a child span automatically
      await chargePayment(order);    // Same here

      span.setStatus({ code: SpanStatusCode.OK });
      return order;
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      throw error;
    } finally {
      span.end();
    }
  });
}

The OTel Collector: Your Observability Router

Never export directly from your app to a vendor. Use the Collector:

App → OTel Collector → Jaeger (traces)
                     → Prometheus (metrics)
                     → Loki (logs)
                     → Datadog (all three, if you want)

collector-config.yaml:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s
  memory_limiter:
    limit_mib: 256

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true
  prometheus:
    endpoint: "0.0.0.0:8889"
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [jaeger, logging]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Free Local Stack with Docker Compose

version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./collector-config.yaml:/etc/otel/config.yaml
    command: ["--config=/etc/otel/config.yaml"]
    ports:
      - "4317:4317"  # gRPC
      - "4318:4318"  # HTTP

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "14250:14250"  # gRPC

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin

Visit localhost:16686 for traces, localhost:3001 for Grafana dashboards.

Python Example (FastAPI)

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor

def configure_telemetry(app):
    provider = TracerProvider()
    exporter = OTLPSpanExporter(endpoint="http://otel-collector:4318/v1/traces")
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

    FastAPIInstrumentor.instrument_app(app)
    HTTPXClientInstrumentor().instrument()
    SQLAlchemyInstrumentor().instrument()

Key Metrics to Track (SRE Golden Signals)

Signal	Metric	Alert Threshold
Latency	`http_request_duration_p99`	> 500ms
Traffic	`http_requests_total` rate	Drop > 20%
Errors	`http_errors_total / http_requests_total`	> 1%
Saturation	`system_cpu_utilization`	> 80%

Sampling Strategies: Don't Trace Everything

At scale, tracing 100% of requests is expensive:

const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

// Sample 10% in production, always sample errors
const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1),
});

For critical paths (payments, auth), use always-on. For health checks, use never.

Common Pitfalls

1. Forgetting context propagation
When making HTTP calls between services, propagate the trace context:

// With auto-instrumentation, this is automatic
// But for manual HTTP clients:
context.with(trace.setSpan(context.active(), span), () => {
  fetch('http://other-service/api/data'); // Context injected automatically
});

2. Over-instrumenting
Don't add a span for every function call. Instrument at meaningful boundaries: HTTP requests, DB queries, cache operations, external API calls.

3. Missing resource attributes
Always set service.name, service.version, and deployment.environment. These are crucial for filtering in production.

OTel in 2026: What's New

Profiles signal is now stable — CPU/memory profiling correlated with traces
OTel Arrow — 80% compression for high-volume telemetry export
eBPF auto-instrumentation — zero-code instrumentation at kernel level
AI observability — LLM token tracking, model latency, prompt/response logging built into the spec

Conclusion

OpenTelemetry is no longer optional for production backends. It's the plumbing that lets you:

Debug production issues in minutes instead of hours
Understand the true cost of every feature
Run SRE practices without a dedicated team

Start with auto-instrumentation, add manual spans for business logic, run the Collector locally. You'll never go back to console.log debugging.

Building production-ready backends? Check out our Node.js REST API Boilerplate Pack — 5 production templates with observability pre-configured. And our Freelancer OS for managing your dev projects.

DEV Community

OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends

OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends

What is OpenTelemetry?

Why OpenTelemetry Matters in 2026

The Three Pillars (With Real Examples)

1. Traces

2. Metrics

3. Logs (Correlated)

Getting Started: Node.js in 5 Minutes

Manual Spans: When Auto-Instrumentation Isn't Enough

The OTel Collector: Your Observability Router

Free Local Stack with Docker Compose

Python Example (FastAPI)

Key Metrics to Track (SRE Golden Signals)

Sampling Strategies: Don't Trace Everything

Common Pitfalls

OTel in 2026: What's New

Conclusion

Top comments (0)