Prometheus Metrics Collection: Avoiding the Cardinality Trap

#prometheus #monitoring #devops #sre

Day 5 -- Prometheus metrics collection.

The most impactful thing you can learn about Prometheus is not how to write queries. It is how to instrument your code so the queries are possible and the server stays alive.

Quick Node.js Instrumentation

const client = require('prom-client');
const express = require('express');

const app = express();
const register = new client.Registry();
client.collectDefaultMetrics({ register });

const httpDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Request duration by route',
  labelNames: ['route', 'method', 'status_class'],
  buckets: [0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 1.0],
  registers: [register],
});

app.use((req, res, next) => {
  const end = httpDuration.startTimer();
  res.on('finish', () => {
    end({ route: req.route?.path || 'unknown', method: req.method, status_class: `${Math.floor(res.statusCode/100)}xx` });
  });
  next();
});

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(3000);

The Cardinality Rule

Every unique label value combination = one time series in memory. Use only bounded values as labels. Never user IDs, IPs, or request IDs.

Monitor your own cardinality:

prometheus_tsdb_head_series -- total active series
scrape_series_added -- new series per scrape

Alert when either spikes after a deploy.

Bucket Selection

Align histogram buckets to your SLO boundaries. Default Prometheus buckets jump from 100ms to 250ms -- useless for a 200ms SLO target. Dense coverage near your threshold gives accurate percentile calculations.

DEV Community

Prometheus Metrics Collection: Avoiding the Cardinality Trap

Quick Node.js Instrumentation

The Cardinality Rule

Bucket Selection

Top comments (0)