S, Sanjay

Posted on Mar 17

Prometheus + Grafana: The Monitoring Stack That Replaced Our $40K/Year Tool

#monitoring #prometheus #kubernetes #devops

We were paying $40K/year for a SaaS monitoring tool. It ingested metrics, showed dashboards, and sent alerts. It also had a 45-second query latency, a 200-metric cardinality limit per service, and a sales team that called every quarter to upsell.

We replaced it with Prometheus + Grafana in 3 weeks. Our query latency dropped to under 2 seconds. We now track 500+ metrics. Total cost: the compute to run it — roughly $200/month on AKS.

Here's the complete setup.

Why Prometheus Wins for Kubernetes

Prometheus was built at SoundCloud in 2012 specifically for monitoring dynamic, containerized environments. It's not a general-purpose database — it's a time-series database optimized for operational metrics.

Three things make it ideal for Kubernetes:

1. Pull-based model. Prometheus scrapes targets at regular intervals. In Kubernetes, it discovers targets automatically through service discovery. When a new pod starts, Prometheus finds it. When it dies, Prometheus stops scraping. No agent installation required.

2. PromQL. The query language is purpose-built for metrics. You can calculate rates, percentiles, ratios, and predictions in a single expression. SQL can't do this efficiently on time-series data.

3. Kubernetes-native service discovery. Prometheus natively understands Kubernetes objects — pods, services, endpoints, nodes, ingresses. Add an annotation to a pod, and Prometheus starts scraping it.

Architecture Overview

┌──────────────────────────────────────────────────┐
│                 Kubernetes Cluster                │
│                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐      │
│  │ App Pod  │  │ App Pod  │  │ App Pod  │      │
│  │ :8080    │  │ :8080    │  │ :8080    │      │
│  │ /metrics │  │ /metrics │  │ /metrics │      │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘      │
│       │              │              │            │
│       └──────────────┼──────────────┘            │
│                      │ scrape                    │
│              ┌───────┴────────┐                  │
│              │   Prometheus   │                  │
│              │   (TSDB)       │                  │
│              │   Port: 9090   │                  │
│              └───────┬────────┘                  │
│                      │                           │
│          ┌───────────┼────────────┐              │
│          │           │            │              │
│  ┌───────┴───┐ ┌─────┴─────┐ ┌───┴──────────┐  │
│  │  Grafana  │ │Alertmanager│ │ Thanos/Cortex│  │
│  │  (UI)     │ │ (Alerts)   │ │ (Long-term)  │  │
│  │  :3000    │ │ :9093      │ │ (Optional)   │  │
│  └───────────┘ └───────────┘ └──────────────┘  │
└──────────────────────────────────────────────────┘

Installation with kube-prometheus-stack

Don't install Prometheus manually. Use the kube-prometheus-stack Helm chart — it bundles Prometheus, Grafana, Alertmanager, node-exporter, kube-state-metrics, and pre-built dashboards.

# Add the Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the full monitoring stack
helm upgrade --install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --values monitoring-values.yaml \
  --version 56.6.2

The values file that matters:

# monitoring-values.yaml

# Prometheus configuration
prometheus:
  prometheusSpec:
    retention: 15d
    retentionSize: "40GB"

    # Resource allocation — critical for stability
    resources:
      requests:
        cpu: "500m"
        memory: "2Gi"
      limits:
        cpu: "2"
        memory: "4Gi"

    # Persistent storage — never lose metrics on pod restart
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: managed-premium
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

    # Scrape interval (15s is standard, 30s for large clusters)
    scrapeInterval: "15s"
    evaluationInterval: "15s"

# Grafana configuration
grafana:
  adminPassword: "use-a-secret-in-production"

  persistence:
    enabled: true
    size: 10Gi

  # Pre-install useful dashboards
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: 'default'
          folder: ''
          type: file
          options:
            path: /var/lib/grafana/dashboards/default

# Alertmanager configuration
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: managed-premium
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

# Node exporter — collects OS-level metrics from every node
nodeExporter:
  enabled: true

# kube-state-metrics — translates K8s object states into metrics
kubeStateMetrics:
  enabled: true

After installation, you get:

Prometheus at monitoring-kube-prometheus-prometheus:9090
Grafana at monitoring-grafana:3000
Alertmanager at monitoring-kube-prometheus-alertmanager:9093
40+ pre-built dashboards (node health, pod resources, API server, etcd, etc.)

Instrumenting Your Applications

Prometheus uses a pull model — your application exposes a /metrics endpoint, and Prometheus scrapes it. Client libraries exist for every language.

Node.js (Express)

// npm install prom-client
const client = require('prom-client');
const express = require('express');
const app = express();

// Collect default metrics (CPU, memory, event loop lag)
client.collectDefaultMetrics({ prefix: 'app_' });

// Custom business metrics
const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
});

const ordersProcessed = new client.Counter({
  name: 'orders_processed_total',
  help: 'Total number of orders processed',
  labelNames: ['status']    // 'success' or 'failed'
});

// Middleware to measure request duration
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on('finish', () => {
    end({ method: req.method, route: req.route?.path || req.path, status_code: res.statusCode });
  });
  next();
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.send(await client.register.metrics());
});

Python (Flask)

# pip install prometheus-client
from prometheus_client import Counter, Histogram, generate_latest
from flask import Flask, Response
import time

app = Flask(__name__)

REQUEST_DURATION = Histogram(
    'http_request_duration_seconds',
    'Request duration in seconds',
    ['method', 'endpoint', 'status']
)

REQUESTS_TOTAL = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

@app.before_request
def start_timer():
    request.start_time = time.time()

@app.after_request
def record_metrics(response):
    duration = time.time() - request.start_time
    REQUEST_DURATION.labels(
        method=request.method,
        endpoint=request.path,
        status=response.status_code
    ).observe(duration)
    REQUESTS_TOTAL.labels(
        method=request.method,
        endpoint=request.path,
        status=response.status_code
    ).inc()
    return response

@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype='text/plain')

Kubernetes annotations for auto-discovery:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
        - name: order-service
          image: order-service:v1.0
          ports:
            - containerPort: 8080

Add those three annotations, and Prometheus discovers and scrapes the pod automatically. No configuration changes to Prometheus itself.

The Four Golden Signals

Google's SRE book defines four signals that matter for every service. Here's how to measure each with PromQL:

1. Latency — How long requests take

# P50 (median) request duration
histogram_quantile(0.50, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# P99 request duration — the tail latency users feel
histogram_quantile(0.99, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# P99 per service
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)

2. Traffic — How many requests per second

# Requests per second (total)
sum(rate(http_requests_total[5m]))

# Requests per second per service
sum(rate(http_requests_total[5m])) by (service)

# Top 5 busiest endpoints
topk(5, sum(rate(http_requests_total[5m])) by (route))

3. Errors — Percentage of failed requests

# Error rate (5xx responses / total responses)
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
* 100

# Error rate per service
sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service)
/
sum(rate(http_requests_total[5m])) by (service)
* 100

4. Saturation — How full your resources are

# CPU usage per pod (% of limit)
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
/
sum(kube_pod_container_resource_limits{resource="cpu"}) by (pod)
* 100

# Memory usage per pod (% of limit)
sum(container_memory_working_set_bytes) by (pod)
/
sum(kube_pod_container_resource_limits{resource="memory"}) by (pod)
* 100

# Disk usage per PVC
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100

Alerting That Doesn't Wake You Up at 3AM

The biggest mistake in monitoring: alerting on every metric threshold. The result is alert fatigue — your team ignores alerts, and when a real incident happens, nobody notices.

Alert on symptoms, not causes

# ❌ BAD: Alerting on cause (CPU is high)
- alert: HighCPU
  expr: node_cpu_usage > 80
  for: 5m
  # Problem: CPU can be 90% and everything works fine.
  # This alert fires constantly and gets ignored.

# ✅ GOOD: Alerting on symptom (error rate is high)
- alert: HighErrorRate
  expr: |
    sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service)
    /
    sum(rate(http_requests_total[5m])) by (service)
    > 0.01
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "{{ $labels.service }} error rate above 1%"
    description: "Error rate is {{ $value | humanizePercentage }}"

Production alert rules:

# Prometheus alert rules
groups:
  - name: application
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service)
          /
          sum(rate(http_requests_total[5m])) by (service)
          > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "{{ $labels.service }} error rate above 1%"

      # High latency
      - alert: HighLatency
        expr: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "{{ $labels.service }} p99 latency above 2 seconds"

      # Pod crash looping
      - alert: PodCrashLooping
        expr: |
          increase(kube_pod_container_status_restarts_total[1h]) > 3
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} restarting frequently"

  - name: infrastructure
    rules:
      # Node disk running out
      - alert: NodeDiskPressure
        expr: |
          (node_filesystem_avail_bytes{mountpoint="/"} 
          / node_filesystem_size_bytes{mountpoint="/"}) < 0.1
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Node {{ $labels.instance }} disk <10% free"

      # PVC almost full
      - alert: PVCAlmostFull
        expr: |
          kubelet_volume_stats_used_bytes 
          / kubelet_volume_stats_capacity_bytes > 0.85
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "PVC {{ $labels.persistentvolumeclaim }} is >85% full"

Alertmanager routing (send alerts to the right channel):

# alertmanager-config.yaml
route:
  receiver: 'default-slack'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: 'pagerduty-critical'
    - match:
        severity: warning
      receiver: 'slack-warnings'

receivers:
  - name: 'pagerduty-critical'
    pagerduty_configs:
      - service_key: '<your-pagerduty-key>'

  - name: 'slack-warnings'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts-warnings'
        title: '{{ .GroupLabels.alertname }}'
        text: '{{ .CommonAnnotations.summary }}'

  - name: 'default-slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts-default'

Critical alerts → PagerDuty (pages the on-call). Warnings → Slack. Everything else → default channel. Nobody gets woken up for a warning.

Grafana Dashboards That Teams Actually Use

The pre-installed dashboards from kube-prometheus-stack are great for infrastructure. For application teams, build service-specific dashboards following the RED method:

Rate — requests per second
Errors — error percentage
Duration — latency distribution

Each service gets one dashboard with these panels:

┌──────────────────────────────────────────────┐
│              Order Service Dashboard          │
├──────────────────┬───────────────────────────┤
│  Request Rate    │  Error Rate               │
│  [line chart]    │  [line chart + threshold]  │
│  52 req/s        │  0.3% ✅                  │
├──────────────────┼───────────────────────────┤
│  P50 Latency     │  P99 Latency              │
│  [gauge]         │  [gauge + alert line]     │
│  45ms            │  380ms                    │
├──────────────────┴───────────────────────────┤
│  Request Duration Distribution (heatmap)      │
│  [shows latency patterns over time]          │
├──────────────────┬───────────────────────────┤
│  Pod CPU Usage   │  Pod Memory Usage         │
│  [per pod]       │  [per pod vs limits]      │
├──────────────────┼───────────────────────────┤
│  Active Pods     │  Pod Restarts (last 24h)  │
│  3/3 healthy     │  0                        │
└──────────────────┴───────────────────────────┘

Key Lessons

1. Start with kube-prometheus-stack. Don't build from scratch. The Helm chart gives you everything needed for production in 10 minutes.

2. Instrument your code, not just infrastructure. Kubernetes metrics tell you pods are healthy. Application metrics tell you users are happy. You need both.

3. Use recording rules for expensive queries. If a PromQL query is used in dashboards AND alerts, pre-compute it as a recording rule to avoid running it multiple times.

4. Set retention based on need. 15 days of high-resolution data in Prometheus is usually enough. For long-term storage (months/years), ship data to Thanos or Cortex.

5. Alert on symptoms, route by severity. Your on-call engineer should be paged for user-impacting issues, not CPU spikes.

Monitoring isn't about collecting data. It's about reducing the time between "something broke" and "we know what broke." Prometheus + Grafana gives you that — without the $40K invoice.

What's your monitoring stack? Still on a SaaS tool or running your own? Share your experience in the comments.

Follow me for more DevOps infrastructure content.

DEV Community