Introduction
In today's distributed systems landscape, observability has become critical for understanding complex applications. This article demonstrates comprehensive observability practices using a real-world Node.js API integrated with Prometheus and Grafana.
The Three Pillars of Observability
1. Metrics
Numerical measurements providing quantitative insights:
- Business Metrics: User registrations, transactions, revenue
- Application Metrics: Response times, error rates, throughput
- Infrastructure Metrics: CPU usage, memory consumption, disk I/O
2. Logs
Time-stamped records of discrete events:
- Structured Logging: JSON format for better parsing
- Contextual Information: Request IDs, user context, transaction details
- Different Log Levels: DEBUG, INFO, WARN, ERROR, FATAL
3. Traces
Track requests across multiple services:
- Distributed Tracing: Follow requests through microservices
- Performance Analysis: Identify slow components
- Dependency Mapping: Understand service relationships
Real-World Implementation with Node.js
Our demonstration project implements a RESTful API with comprehensive observability features.
Architecture Overview
Node.js API (Port 3000) → Prometheus (Port 9090) → Grafana (Port 3000)
↓
Traffic Generator (Test Script)
Core Metrics Implementation
HTTP Request Duration Histogram
const httpDuration = new promClient.Histogram({
name: 'http_request_duration_ms',
help: 'Duration of HTTP requests in ms',
labelNames: ['method', 'route', 'status'],
buckets: [1, 5, 15, 50, 100, 500, 1000]
});
Request Counter
const httpRequests = new promClient.Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status']
});
Active Connections Gauge
const activeConnections = new promClient.Gauge({
name: 'active_connections',
help: 'Active connections'
});
API Endpoints for Testing
-
GET /
- Basic health endpoint -
GET /users
- List users with variable latency -
GET /users/:id
- Specific user lookup with error cases -
GET /slow
- Intentionally slow endpoint (2-5s response time) -
GET /error
- Random error generation for testing -
GET /metrics
- Prometheus metrics endpoint
Middleware Implementation
// Response time tracking
app.use(responseTime((req, res, time) => {
const route = req.route ? req.route.path : req.path;
httpDuration.labels(req.method, route, res.statusCode).observe(time);
httpRequests.labels(req.method, route, res.statusCode).inc();
if (res.statusCode >= 400) {
httpErrors.labels(req.method, route, res.statusCode).inc();
}
}));
Essential Prometheus Queries
Request Rate (RPS)
rate(http_requests_total[1m])
Error Rate Percentage
rate(http_errors_total[1m]) / rate(http_requests_total[1m]) * 100
95th Percentile Response Time
histogram_quantile(0.95, rate(http_request_duration_ms_bucket[1m]))
Best Practices
1. Metric Design Principles
-
Use Standard Suffixes:
_total
,_duration_seconds
,_bytes
- Consistent Labeling: Standardize label names across services
- Avoid High Cardinality: Limit unique label combinations
2. Golden Signals Implementation
- Latency: Time to process requests
- Traffic: Demand on your system
- Errors: Rate of failed requests
- Saturation: Resource utilization
3. Effective Alerting
- Alert on Symptoms: Focus on user impact, not causes
- Meaningful Thresholds: Avoid alert fatigue
- Runbook Integration: Provide clear remediation steps
GitHub Repository
The complete implementation with automated setup scripts is available:
🔗 GitHub Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git
Repository Features
- ✅ Complete source code with all implementation files
- ✅ Automated setup scripts for Windows/Mac/Linux
- ✅ Comprehensive documentation and troubleshooting guides
- ✅ CI/CD ready with GitHub Actions integration
- ✅ Traffic generator for realistic testing scenarios
Conclusion
Observability transforms raw metrics into actionable insights that drive better system reliability and user experience. This implementation demonstrates:
- Holistic Approach: Combining metrics, logs, and traces
- Practical Implementation: Real-world Node.js example
- Automation First: Scripted setup reduces barriers
- Best Practices: Following established patterns
Next Steps
- Extend metrics with business-specific measurements
- Implement meaningful alerting
- Add distributed tracing with Jaeger
- Apply patterns to production systems
Author: Jhon TiCona Chambi
Technologies: Node.js, Prometheus, Grafana, Express.js
Repository: https://github.com/jhonticonachambi/observability-practices-nodejs.git
Top comments (0)