DEV Community

Otto
Otto

Posted on

OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends

OpenTelemetry in 2026: The Complete Guide to Observability for Modern Backends

If you're still debugging production issues with console.log, this guide is for you.

OpenTelemetry (OTel) has become the de-facto standard for observability in 2026. It's vendor-neutral, open-source, and supported by every major cloud provider and APM tool. Here's everything you need to know to implement it properly.

What is OpenTelemetry?

OpenTelemetry is a CNCF project that provides:

  • Traces — distributed request flows across services
  • Metrics — numeric measurements over time (latency, error rate, throughput)
  • Logs — structured event records (now with context correlation)

The key insight: with OTel, all three signals share the same trace context, so you can jump from a slow metric → the trace → the log that explains why.

Why OpenTelemetry Matters in 2026

The observability landscape has shifted:

Before OTel:     proprietary SDK → vendor lock-in → $$$
After OTel:      OTel SDK → OTel Collector → any backend
Enter fullscreen mode Exit fullscreen mode

You instrument once and can export to Grafana Cloud, Datadog, Honeycomb, Jaeger, or your own Prometheus stack — without changing your application code.

The Three Pillars (With Real Examples)

1. Traces

A trace represents the lifecycle of a single request:

HTTP Request
  └── Auth Middleware (12ms)
  └── Route Handler (245ms)
        └── DB Query: SELECT users (180ms)
        └── Redis Cache SET (8ms)
        └── Email Service Call (45ms) [FAILED]
Enter fullscreen mode Exit fullscreen mode

Without tracing, you'd see "request took 310ms" and have no idea where the time went.

2. Metrics

Metrics are aggregated over time:

  • http_request_duration_seconds (histogram)
  • http_requests_total (counter)
  • database_pool_connections_active (gauge)

3. Logs (Correlated)

The new superpower: logs linked to traces. When you see a slow trace, click → see all logs from that exact request context. No more searching by timestamp.

Getting Started: Node.js in 5 Minutes

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-prometheus
Enter fullscreen mode Exit fullscreen mode

Create tracing.js (load this before your app):

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
const { Resource } = require('@opentelemetry/resources');
const { ATTR_SERVICE_NAME } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'my-api',
    'deployment.environment': process.env.NODE_ENV || 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PrometheusExporter({ port: 9464 }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
console.log('OpenTelemetry initialized');

process.on('SIGTERM', () => sdk.shutdown());
Enter fullscreen mode Exit fullscreen mode

Start your app with:

node --require ./tracing.js server.js
Enter fullscreen mode Exit fullscreen mode

That's it. You now have automatic instrumentation for Express, HTTP calls, database queries, and more — zero code changes to your app.

Manual Spans: When Auto-Instrumentation Isn't Enough

For business logic, add custom spans:

const { trace, context, SpanStatusCode } = require('@opentelemetry/api');

const tracer = trace.getTracer('my-service', '1.0.0');

async function processOrder(orderId) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    try {
      // Add context to the span
      span.setAttributes({
        'order.id': orderId,
        'order.source': 'web',
      });

      const order = await db.getOrder(orderId);
      span.setAttribute('order.total', order.total);
      span.setAttribute('order.items_count', order.items.length);

      await validateInventory(order); // This creates a child span automatically
      await chargePayment(order);    // Same here

      span.setStatus({ code: SpanStatusCode.OK });
      return order;
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      throw error;
    } finally {
      span.end();
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

The OTel Collector: Your Observability Router

Never export directly from your app to a vendor. Use the Collector:

App → OTel Collector → Jaeger (traces)
                     → Prometheus (metrics)
                     → Loki (logs)
                     → Datadog (all three, if you want)
Enter fullscreen mode Exit fullscreen mode

collector-config.yaml:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
    timeout: 10s
  memory_limiter:
    limit_mib: 256

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true
  prometheus:
    endpoint: "0.0.0.0:8889"
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [jaeger, logging]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]
Enter fullscreen mode Exit fullscreen mode

Free Local Stack with Docker Compose

version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./collector-config.yaml:/etc/otel/config.yaml
    command: ["--config=/etc/otel/config.yaml"]
    ports:
      - "4317:4317"  # gRPC
      - "4318:4318"  # HTTP

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # UI
      - "14250:14250"  # gRPC

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
Enter fullscreen mode Exit fullscreen mode

Visit localhost:16686 for traces, localhost:3001 for Grafana dashboards.

Python Example (FastAPI)

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor

def configure_telemetry(app):
    provider = TracerProvider()
    exporter = OTLPSpanExporter(endpoint="http://otel-collector:4318/v1/traces")
    provider.add_span_processor(BatchSpanProcessor(exporter))
    trace.set_tracer_provider(provider)

    FastAPIInstrumentor.instrument_app(app)
    HTTPXClientInstrumentor().instrument()
    SQLAlchemyInstrumentor().instrument()
Enter fullscreen mode Exit fullscreen mode

Key Metrics to Track (SRE Golden Signals)

Signal Metric Alert Threshold
Latency http_request_duration_p99 > 500ms
Traffic http_requests_total rate Drop > 20%
Errors http_errors_total / http_requests_total > 1%
Saturation system_cpu_utilization > 80%

Sampling Strategies: Don't Trace Everything

At scale, tracing 100% of requests is expensive:

const { ParentBasedSampler, TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

// Sample 10% in production, always sample errors
const sampler = new ParentBasedSampler({
  root: new TraceIdRatioBasedSampler(0.1),
});
Enter fullscreen mode Exit fullscreen mode

For critical paths (payments, auth), use always-on. For health checks, use never.

Common Pitfalls

1. Forgetting context propagation
When making HTTP calls between services, propagate the trace context:

// With auto-instrumentation, this is automatic
// But for manual HTTP clients:
context.with(trace.setSpan(context.active(), span), () => {
  fetch('http://other-service/api/data'); // Context injected automatically
});
Enter fullscreen mode Exit fullscreen mode

2. Over-instrumenting
Don't add a span for every function call. Instrument at meaningful boundaries: HTTP requests, DB queries, cache operations, external API calls.

3. Missing resource attributes
Always set service.name, service.version, and deployment.environment. These are crucial for filtering in production.

OTel in 2026: What's New

  • Profiles signal is now stable — CPU/memory profiling correlated with traces
  • OTel Arrow — 80% compression for high-volume telemetry export
  • eBPF auto-instrumentation — zero-code instrumentation at kernel level
  • AI observability — LLM token tracking, model latency, prompt/response logging built into the spec

Conclusion

OpenTelemetry is no longer optional for production backends. It's the plumbing that lets you:

  • Debug production issues in minutes instead of hours
  • Understand the true cost of every feature
  • Run SRE practices without a dedicated team

Start with auto-instrumentation, add manual spans for business logic, run the Collector locally. You'll never go back to console.log debugging.


Building production-ready backends? Check out our Node.js REST API Boilerplate Pack — 5 production templates with observability pre-configured. And our Freelancer OS for managing your dev projects.

Top comments (0)