DEV Community

Jhon Thomas Ticona Chambi
Jhon Thomas Ticona Chambi

Posted on

Observability Practices: A Complete Guide with Node.js Implementation

Introduction

In today's distributed systems landscape, observability has become critical for understanding complex applications. This article demonstrates comprehensive observability practices using a real-world Node.js API integrated with Prometheus and Grafana.

The Three Pillars of Observability

1. Metrics

Numerical measurements providing quantitative insights:

  • Business Metrics: User registrations, transactions, revenue
  • Application Metrics: Response times, error rates, throughput
  • Infrastructure Metrics: CPU usage, memory consumption, disk I/O

2. Logs

Time-stamped records of discrete events:

  • Structured Logging: JSON format for better parsing
  • Contextual Information: Request IDs, user context, transaction details
  • Different Log Levels: DEBUG, INFO, WARN, ERROR, FATAL

3. Traces

Track requests across multiple services:

  • Distributed Tracing: Follow requests through microservices
  • Performance Analysis: Identify slow components
  • Dependency Mapping: Understand service relationships

Real-World Implementation with Node.js

Our demonstration project implements a RESTful API with comprehensive observability features.

Architecture Overview

Node.js API (Port 3000) → Prometheus (Port 9090) → Grafana (Port 3000)
                ↓
    Traffic Generator (Test Script)
Enter fullscreen mode Exit fullscreen mode

Core Metrics Implementation

HTTP Request Duration Histogram

const httpDuration = new promClient.Histogram({
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['method', 'route', 'status'],
  buckets: [1, 5, 15, 50, 100, 500, 1000]
});
Enter fullscreen mode Exit fullscreen mode

Request Counter

const httpRequests = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status']
});
Enter fullscreen mode Exit fullscreen mode

Active Connections Gauge

const activeConnections = new promClient.Gauge({
  name: 'active_connections',
  help: 'Active connections'
});
Enter fullscreen mode Exit fullscreen mode

API Endpoints for Testing

  1. GET / - Basic health endpoint
  2. GET /users - List users with variable latency
  3. GET /users/:id - Specific user lookup with error cases
  4. GET /slow - Intentionally slow endpoint (2-5s response time)
  5. GET /error - Random error generation for testing
  6. GET /metrics - Prometheus metrics endpoint

Middleware Implementation

// Response time tracking
app.use(responseTime((req, res, time) => {
  const route = req.route ? req.route.path : req.path;

  httpDuration.labels(req.method, route, res.statusCode).observe(time);
  httpRequests.labels(req.method, route, res.statusCode).inc();

  if (res.statusCode >= 400) {
    httpErrors.labels(req.method, route, res.statusCode).inc();
  }
}));
Enter fullscreen mode Exit fullscreen mode

Essential Prometheus Queries

Request Rate (RPS)

rate(http_requests_total[1m])
Enter fullscreen mode Exit fullscreen mode

Error Rate Percentage

rate(http_errors_total[1m]) / rate(http_requests_total[1m]) * 100
Enter fullscreen mode Exit fullscreen mode

95th Percentile Response Time

histogram_quantile(0.95, rate(http_request_duration_ms_bucket[1m]))
Enter fullscreen mode Exit fullscreen mode

Best Practices

1. Metric Design Principles

  • Use Standard Suffixes: _total, _duration_seconds, _bytes
  • Consistent Labeling: Standardize label names across services
  • Avoid High Cardinality: Limit unique label combinations

2. Golden Signals Implementation

  • Latency: Time to process requests
  • Traffic: Demand on your system
  • Errors: Rate of failed requests
  • Saturation: Resource utilization

3. Effective Alerting

  • Alert on Symptoms: Focus on user impact, not causes
  • Meaningful Thresholds: Avoid alert fatigue
  • Runbook Integration: Provide clear remediation steps

GitHub Repository

The complete implementation with automated setup scripts is available:

🔗 GitHub Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git

Repository Features

  • ✅ Complete source code with all implementation files
  • ✅ Automated setup scripts for Windows/Mac/Linux
  • ✅ Comprehensive documentation and troubleshooting guides
  • ✅ CI/CD ready with GitHub Actions integration
  • ✅ Traffic generator for realistic testing scenarios

Conclusion

Observability transforms raw metrics into actionable insights that drive better system reliability and user experience. This implementation demonstrates:

  1. Holistic Approach: Combining metrics, logs, and traces
  2. Practical Implementation: Real-world Node.js example
  3. Automation First: Scripted setup reduces barriers
  4. Best Practices: Following established patterns

Next Steps

  1. Extend metrics with business-specific measurements
  2. Implement meaningful alerting
  3. Add distributed tracing with Jaeger
  4. Apply patterns to production systems

Author: Jhon TiCona Chambi

Technologies: Node.js, Prometheus, Grafana, Express.js

Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git

Top comments (0)