DEV Community

Cover image for πŸ”­ Observability Practices: The 3 Pillars with a Node.js + OpenTelemetry Example

πŸ”­ Observability Practices: The 3 Pillars with a Node.js + OpenTelemetry Example

πŸš€ Demystifying Observability: A Practical Guide with Node.js, OpenTelemetry, Prometheus, and Grafana

🧐 Link to the practice repository: Wsalas651/observability-demo


In the modern era of distributed systems and microservices, relying solely on traditional monitoring is like trying to diagnose a complex illness with just a thermometer. Monitoring tells you when a problem exists (e.g., "CPU usage is high"), but Observability gives you the tools to understand why the problem is happening.

What is Observability?

Observability is a property of a system that allows an operator to infer its internal state by examining its external outputs. It's the ability to ask arbitrary questions about your system without having to release new code to answer them.

It is universally defined by its three pillars:

  1. Metrics (The Numeric Story): These are aggregated numerical measurements collected over time. They tell you how much or how often something is happening. They are best for time-series analysis and spotting trends.
    • Examples: Request count, latency percentiles, CPU usage, memory consumption.
  2. Logs (The Discrete Events): These are immutable, timestamped text records of discrete events that occurred at a specific point in time. They are the "what happened when" data.
    • Examples: User X logged in, a database query failed, a transaction was processed.
  3. Traces (The Request's Journey): A trace represents the end-to-end path of a single request or transaction as it flows through a distributed system. They are crucial for debugging latency issues in microservice architectures.
    • Examples: A user clicking a button initiates calls across Service A, Service B, and a Database, showing the time spent in each.

Why Observability Matters

In a monolithic application, debugging might be challenging, but the data is centralized. In a microservices environment, a single user click can trigger a chain of calls across dozens of services written in different languages.

Observability allows you to:

  • Reduce Mean Time To Resolution (MTTR): Spend less time guessing and more time fixing.
  • Handle Unknown-Unknowns: Debug issues you never anticipated.
  • Improve System Health: Use the data to refactor bottlenecks and optimize resource usage.

πŸ› οΈ Practical Example: Observability with Node.js and OpenTelemetry

We will demonstrate how to instrument a simple Node.js Express API using OpenTelemetry for both metrics and traces, and then visualize that data using Prometheus and Grafana.

The Observability Stack

  • Application: A simple Node.js Express API.
  • Instrumentation: OpenTelemetry (OTEL). This is a vendor-neutral standard for instrumenting code.
  • Metrics Store: Prometheus, which scrapes metrics from the app.
  • Visualization: Grafana, for building dashboards on the Prometheus data.
  • Orchestration: Docker Compose to run the full stack.

Step 1: Instrumenting the Node.js API

Our application needs to expose its internal state. We use OpenTelemetry to set up tracing and custom Prometheus metrics.

1. OpenTelemetry Tracing Setup (app/src/tracer.js)

OpenTelemetry handles the heavy lifting of exporting trace data. This file initializes the tracer, ensuring every request across services is connected.

// app/src/tracer.js
// tracer.js - sets up OpenTelemetry to send traces to Jaeger
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');

function setupTracing(serviceName = 'observability-demo-app') {
  const exporter = new JaegerExporter({
    endpoint: 'http://jaeger:14268/api/traces'
    // For UDP agent: { host: 'jaeger', port: 6832 }
  });

  const sdk = new NodeSDK({
    traceExporter: exporter,
    instrumentations: [getNodeAutoInstrumentations()],
    serviceName
  });

  sdk.start()
    .then(() => {
      console.log('OpenTelemetry initialized');
    })
    .catch((err) => {
      console.error('Error starting OpenTelemetry SDK', err);
    });

  // Optionally handle graceful shutdown
  process.on('SIGTERM', () => {
    sdk.shutdown()
      .then(() => console.log('Tracing terminated'))
      .catch((e) => console.log('Error terminating tracing', e));
  });
}

module.exports = { setupTracing };
Enter fullscreen mode Exit fullscreen mode

2. Custom Metrics (app/src/metrics.js)

While OTEL provides basic metrics, custom metrics are essential for business logic. We use the standard prom-client library (which OTEL often leverages) to define and track two core metrics:

// metrics.js - registers Prometheus metrics
const client = require('prom-client');

// Default metrics (CPU, memory etc.)
client.collectDefaultMetrics({
  timeout: 5000
});

// Create a Registry (could use default registry)
const register = client.register;

// HTTP request metrics
const httpRequestCounter = new client.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status']
});

const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.005, 0.01, 0.05, 0.1, 0.3, 1, 3, 5]
});

function metricsMiddleware(req, res, next) {
  const end = httpRequestDuration.startTimer();
  res.on('finish', () => {
    const route = req.route ? req.route.path : req.path;
    httpRequestCounter.inc({ method: req.method, route, status: res.statusCode });
    end({ method: req.method, route, status: res.statusCode });
  });
  next();
}

module.exports = { register, metricsMiddleware };
Enter fullscreen mode Exit fullscreen mode

Step 2: Orchestration with Docker Compose

Your docker-compose.yml ties the entire stack together, defining the application, Prometheus, and Grafana services.

version: '3.8'
services:
  app:
    build: ./app
    image: observability-demo-app:latest
    container_name: observability-demo-app
    ports:
      - "3000:3000"
    environment:
      - PORT=3000
    depends_on:
      - prometheus
      - jaeger

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    depends_on:
      - prometheus
    ports:
      - "3001:3000"
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  jaeger:
    image: jaegertracing/all-in-one:1.39
    container_name: jaeger
    ports:
      - "16686:16686"   # Jaeger UI
      - "14268:14268"   # Jaeger collector (HTTP)

Enter fullscreen mode Exit fullscreen mode

Step 3: Seeing the README.md of the repository.

Conclusion

Observability is a critical capability, not just a toolset. By adopting a standard like OpenTelemetry and leveraging powerful open-source tools like Prometheus and Grafana, you gain the ability to move beyond basic monitoring and truly understand the internal behavior of your distributed applications.

Stop guessing, start observing! Happy coding! πŸš€

Top comments (0)