Jhon Thomas Ticona Chambi

Posted on Jul 5

Observability Practices: A Complete Guide with Node.js Implementation

#node #observability #prometheus #monitoring

Introduction

In today's distributed systems landscape, observability has become critical for understanding complex applications. This article demonstrates comprehensive observability practices using a real-world Node.js API integrated with Prometheus and Grafana.

The Three Pillars of Observability

1. Metrics

Numerical measurements providing quantitative insights:

Business Metrics: User registrations, transactions, revenue
Application Metrics: Response times, error rates, throughput
Infrastructure Metrics: CPU usage, memory consumption, disk I/O

2. Logs

Time-stamped records of discrete events:

Structured Logging: JSON format for better parsing
Contextual Information: Request IDs, user context, transaction details
Different Log Levels: DEBUG, INFO, WARN, ERROR, FATAL

3. Traces

Track requests across multiple services:

Distributed Tracing: Follow requests through microservices
Performance Analysis: Identify slow components
Dependency Mapping: Understand service relationships

Real-World Implementation with Node.js

Our demonstration project implements a RESTful API with comprehensive observability features.

Architecture Overview

Node.js API (Port 3000) → Prometheus (Port 9090) → Grafana (Port 3000)
                ↓
    Traffic Generator (Test Script)

Core Metrics Implementation

HTTP Request Duration Histogram

const httpDuration = new promClient.Histogram({
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['method', 'route', 'status'],
  buckets: [1, 5, 15, 50, 100, 500, 1000]
});

Request Counter

const httpRequests = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status']
});

Active Connections Gauge

const activeConnections = new promClient.Gauge({
  name: 'active_connections',
  help: 'Active connections'
});

API Endpoints for Testing

GET / - Basic health endpoint
GET /users - List users with variable latency
GET /users/:id - Specific user lookup with error cases
GET /slow - Intentionally slow endpoint (2-5s response time)
GET /error - Random error generation for testing
GET /metrics - Prometheus metrics endpoint

Middleware Implementation

// Response time tracking
app.use(responseTime((req, res, time) => {
  const route = req.route ? req.route.path : req.path;

  httpDuration.labels(req.method, route, res.statusCode).observe(time);
  httpRequests.labels(req.method, route, res.statusCode).inc();

  if (res.statusCode >= 400) {
    httpErrors.labels(req.method, route, res.statusCode).inc();
  }
}));

Essential Prometheus Queries

Request Rate (RPS)

rate(http_requests_total[1m])

Error Rate Percentage

rate(http_errors_total[1m]) / rate(http_requests_total[1m]) * 100

95th Percentile Response Time

histogram_quantile(0.95, rate(http_request_duration_ms_bucket[1m]))

Best Practices

1. Metric Design Principles

Use Standard Suffixes: _total, _duration_seconds, _bytes
Consistent Labeling: Standardize label names across services
Avoid High Cardinality: Limit unique label combinations

2. Golden Signals Implementation

Latency: Time to process requests
Traffic: Demand on your system
Errors: Rate of failed requests
Saturation: Resource utilization

3. Effective Alerting

Alert on Symptoms: Focus on user impact, not causes
Meaningful Thresholds: Avoid alert fatigue
Runbook Integration: Provide clear remediation steps

GitHub Repository

The complete implementation with automated setup scripts is available:

🔗 GitHub Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git

Repository Features

✅ Complete source code with all implementation files
✅ Automated setup scripts for Windows/Mac/Linux
✅ Comprehensive documentation and troubleshooting guides
✅ CI/CD ready with GitHub Actions integration
✅ Traffic generator for realistic testing scenarios

Conclusion

Observability transforms raw metrics into actionable insights that drive better system reliability and user experience. This implementation demonstrates:

Holistic Approach: Combining metrics, logs, and traces
Practical Implementation: Real-world Node.js example
Automation First: Scripted setup reduces barriers
Best Practices: Following established patterns

Next Steps

Extend metrics with business-specific measurements
Implement meaningful alerting
Add distributed tracing with Jaeger
Apply patterns to production systems

Author: Jhon TiCona Chambi

Technologies: Node.js, Prometheus, Grafana, Express.js

Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git

DEV Community