DEV Community

Naveen Karasu
Naveen Karasu

Posted on

Prometheus Metrics Collection: Avoiding the Cardinality Trap

Day 5 -- Prometheus metrics collection.

The most impactful thing you can learn about Prometheus is not how to write queries. It is how to instrument your code so the queries are possible and the server stays alive.

Quick Node.js Instrumentation

const client = require('prom-client');
const express = require('express');

const app = express();
const register = new client.Registry();
client.collectDefaultMetrics({ register });

const httpDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Request duration by route',
  labelNames: ['route', 'method', 'status_class'],
  buckets: [0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 1.0],
  registers: [register],
});

app.use((req, res, next) => {
  const end = httpDuration.startTimer();
  res.on('finish', () => {
    end({ route: req.route?.path || 'unknown', method: req.method, status_class: `${Math.floor(res.statusCode/100)}xx` });
  });
  next();
});

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(3000);
Enter fullscreen mode Exit fullscreen mode

The Cardinality Rule

Every unique label value combination = one time series in memory. Use only bounded values as labels. Never user IDs, IPs, or request IDs.

Monitor your own cardinality:

  • prometheus_tsdb_head_series -- total active series
  • scrape_series_added -- new series per scrape

Alert when either spikes after a deploy.

Bucket Selection

Align histogram buckets to your SLO boundaries. Default Prometheus buckets jump from 100ms to 250ms -- useless for a 200ms SLO target. Dense coverage near your threshold gives accurate percentile calculations.

Top comments (0)