DEV Community

Atlas Whoff
Atlas Whoff

Posted on

OpenTelemetry for Node.js: Distributed Tracing Without Vendor Lock-in

The Problem with Observability Silos

You have logs in CloudWatch, metrics in Datadog, and traces in a different tool. Correlating a slow request across three systems is a manual nightmare.

OpenTelemetry (OTel) standardizes how you emit telemetry—then send it anywhere.

Core Concepts

  • Traces: The journey of a request through your system
  • Spans: Individual operations within a trace
  • Metrics: Numerical measurements over time
  • Logs: Timestamped events (OTel correlates these with traces)

Setup

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-metrics-otlp-http
Enter fullscreen mode Exit fullscreen mode
// instrumentation.ts — load BEFORE everything else
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SEMRESATTRS_SERVICE_NAME } from '@opentelemetry/semantic-conventions';

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: 'my-api',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4318/v1/metrics',
    }),
    exportIntervalMillis: 30000,
  }),
  instrumentations: [getNodeAutoInstrumentations({
    '@opentelemetry/instrumentation-fs': { enabled: false }, // too noisy
  })],
});

sdk.start();

process.on('SIGTERM', () => sdk.shutdown());
Enter fullscreen mode Exit fullscreen mode
// server.ts
require('./instrumentation'); // Must be first
import express from 'express';
// ...
Enter fullscreen mode Exit fullscreen mode

Auto-Instrumentation

getNodeAutoInstrumentations automatically instruments:

  • HTTP/HTTPS: every incoming and outgoing request
  • Express: middleware, routes
  • Prisma/pg: database queries
  • Redis: cache operations
  • gRPC: service calls

Zero code changes needed for these.

Custom Spans

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('my-service');

async function processOrder(orderId: string) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    span.setAttributes({
      'order.id': orderId,
      'order.source': 'api',
    });

    try {
      const order = await db.orders.findUnique({ where: { id: orderId } });
      span.setAttributes({ 'order.total': order.total, 'order.items': order.items.length });

      await tracer.startActiveSpan('validateInventory', async (childSpan) => {
        await checkInventory(order.items);
        childSpan.end();
      });

      await tracer.startActiveSpan('chargePayment', async (childSpan) => {
        await chargeStripe(order);
        childSpan.end();
      });

      span.setStatus({ code: SpanStatusCode.OK });
      return order;
    } catch (error) {
      span.recordException(error as Error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
      throw error;
    } finally {
      span.end();
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

Custom Metrics

import { metrics } from '@opentelemetry/api';

const meter = metrics.getMeter('my-service');

// Counter: monotonically increasing
const requestCounter = meter.createCounter('http.requests.total', {
  description: 'Total HTTP requests',
});

// Histogram: distribution of values
const requestDuration = meter.createHistogram('http.request.duration', {
  description: 'HTTP request duration in ms',
  unit: 'ms',
});

// Observable gauge: current value
const activeConnections = meter.createObservableGauge('db.connections.active', {
  description: 'Active database connections',
});
activeConnections.addCallback((result) => {
  result.observe(pool.totalCount - pool.idleCount);
});

// Middleware to record metrics
app.use((req, res, next) => {
  const start = Date.now();
  requestCounter.add(1, { method: req.method, route: req.route?.path });

  res.on('finish', () => {
    requestDuration.record(Date.now() - start, {
      method: req.method,
      status_code: res.statusCode,
      route: req.route?.path ?? 'unknown',
    });
  });

  next();
});
Enter fullscreen mode Exit fullscreen mode

Backends You Can Send To

# Jaeger (open source, self-hosted)
docker run -p 16686:16686 -p 4318:4318 jaegertracing/all-in-one

# Grafana Tempo + Loki + Prometheus (full stack)
# See grafana/otel-lgtm docker image

# Commercial: Datadog, Honeycomb, New Relic, Lightstep
# Just change the OTLP endpoint URL
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io/
OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=YOUR_API_KEY
Enter fullscreen mode Exit fullscreen mode

That's the point: instrument once, switch backends by changing an env var.

What You Get

After setup, for every request you automatically see:

  • Full trace with all DB queries and their duration
  • Which query is the bottleneck
  • Downstream HTTP calls and their latency
  • Error details with stack traces linked to traces
  • P50/P95/P99 latency per endpoint

No more guessing where time is spent.


OpenTelemetry instrumentation pre-configured with Jaeger for local dev and OTLP for production: Whoff Agents AI SaaS Starter Kit.

Top comments (0)