DEV Community

Cover image for The Complete Guide to Prometheus Metric Types

The Complete Guide to Prometheus Metric Types

The Complete Guide to Prometheus Metric Types: PromQL, Alerting and Troubleshooting

Reading Time: 15 minutes

Table of Contents

  1. The 3 AM Call
  2. Quick Reference Card
  3. Which Metric Type Should I Use
  4. Meet the Four Metric Types
  5. Comparison Matrix
  6. PromQL Functions by Metric Type
  7. Alerting Strategies
  8. Troubleshooting Quick Reference
  9. The Cardinality Monster
  10. Best Practices
  11. References
  12. Conclusion

The 3 AM Call

It's 3:17 AM. Your phone buzzes violently on the nightstand.

You grab it with one eye open. PagerDuty. Of course.

"CRITICAL: API latency exceeds threshold"

You stumble to your laptop, coffee-less and bleary-eyed. Grafana loads. The dashboard is a mess of red lines spiking upward. Your mind races: Is this a traffic spike? A memory leak? Did someone deploy something?

You stare at the metrics. http_requests_total is climbing. process_resident_memory_bytes looks normal. But wait... what does that histogram actually mean? Why is the p99 showing NaN? And why on earth did someone create a metric with user_id as a label?

Sound familiar?

This guide exists because I've been there. We've all been there. And the truth is, most Prometheus pain comes down to one thing: not fully understanding the four metric types.

Let me introduce you to them. Think of them as four tools in your observability toolkit. Each has a job. Each has rules. Use the wrong one, and you'll be back at 3 AM wondering why your alerts are lying to you.

Let's fix that.

The Four Prometheus Metric Types

Quick Reference Card

Need a quick answer? Start here.

Metric Type Best For Key Function Suffix Can Aggregate?
Counter Totals (requests, errors, bytes) rate() _total ✅ Yes
Gauge Current state (memory, CPU) Raw value None ✅ Yes
Histogram Latency distributions histogram_quantile() _seconds ✅ Yes
Summary Per-instance percentiles Direct read _seconds ⚠️ Only sum/count

The Essential Queries You'll Use Every Day

# Counter: "How many requests per second are we getting?"
rate(http_requests_total[5m])

# Gauge: "How much memory are we using right now?"
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Histogram: "What's our p99 latency across all pods?"
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

# Summary: "What's the average latency?" (works across instances, unlike quantiles)
sum(rate(http_request_duration_seconds_sum[5m])) / sum(rate(http_request_duration_seconds_count[5m]))
Enter fullscreen mode Exit fullscreen mode

Which Metric Type Should I Use

Before diving into the details, let me save you some time. Here's a decision flowchart that I wish someone had shown me years ago:

Metric Type Decision Flowchart
The Quick Decision Table

If you would say... Use
"How many X happened?" Counter
"What is the current X?" Gauge
"What's the p99 latency across all pods?" Histogram
"What's the p99 on this specific pod?" Summary

Now let me tell you the stories behind each of these tools.

Meet the Four Metric Types

Counter: The Tireless Bookkeeper

Picture a diligent accountant who sits at the entrance of your application. Every time a request comes in, she makes a tally mark. Every error? Another tally. Bytes transferred? She counts them all.

The Counter never forgets. She never erases. Her numbers only go up. The only time they reset is when she goes home for the night (your process restarts).

The Counter's Personality

A Counter is a cumulative metric that only increases. Think of it as an odometer in your car. The number only goes up. You don't care about the current number per se; you care about how fast it's changing.

This is the crucial insight: raw counter values are almost useless. What you want is the rate.

When to Use a Counter

Counters thrive when tracking:

  • Total HTTP requests received
  • Bytes sent over the network
  • Errors encountered
  • Background jobs completed
  • Messages processed from a queue

Counter Characteristics

Property Value
Direction Only goes up (monotonically increasing)
Reset Behavior Resets to 0 when the process restarts
Typical Suffix _total
Raw Value Usefulness Low (always use rate() or increase())

Talking to the Counter: PromQL Patterns

# The WRONG way: Raw value tells you nothing useful
http_requests_total

# The RIGHT way: Rate of requests per second over 5 minutes
rate(http_requests_total[5m])

# Filter by label (e.g., only 500 errors)
rate(http_requests_total{status="500"}[5m])

# Total increase over the last hour
increase(http_requests_total[1h])

# Sum rates across all instances
sum(rate(http_requests_total[5m]))

# Group by HTTP method
sum by (method) (rate(http_requests_total[5m]))

# The money query: Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) 
/ 
sum(rate(http_requests_total[5m])) * 100
Enter fullscreen mode Exit fullscreen mode

Counter Alerts That Actually Work

# "Our error rate is too high"
- alert: HighErrorRate
  expr: |
    sum(rate(http_requests_total{status=~"5.."}[5m])) 
    / 
    sum(rate(http_requests_total[5m])) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Error rate exceeds 5%"

# "Traffic dropped suddenly - possible outage"
- alert: TrafficDrop
  expr: |
    sum(rate(http_requests_total[5m])) 
    < 
    sum(rate(http_requests_total[5m] offset 1h)) * 0.5
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Traffic dropped by more than 50% compared to 1 hour ago"

# "We're getting zero requests - something is very wrong"
- alert: NoTraffic
  expr: sum(rate(http_requests_total[5m])) == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "No HTTP requests received in the last 5 minutes"
Enter fullscreen mode Exit fullscreen mode

Gauge: The Live Reporter

If the Counter is an accountant tallying historical records, the Gauge is a live news reporter telling you what's happening right now.

"Memory usage is at 78%!" she reports. A moment later: "It dropped to 72%!" Unlike the Counter, the Gauge's numbers go up and down. She reflects the current state of the world.

The Gauge's Personality

A Gauge represents a single numerical value that can arbitrarily go up and down. It's a snapshot of reality at any moment. Think of a thermometer, a fuel gauge, or your current queue depth.

The beautiful thing about gauges? The raw value is immediately meaningful. When someone asks "How much memory are we using?", the gauge has the answer.

When to Use a Gauge

Gauges excel at:

  • Current memory or CPU usage
  • Number of active connections
  • Queue depth
  • Temperature readings
  • Number of goroutines running
  • Disk space remaining

Gauge Characteristics

Property Value
Direction Can increase or decrease
Reset Behavior Not applicable (always reflects current state)
Typical Suffix None specific
Raw Value Usefulness High (the current value is what you want)

Talking to the Gauge: PromQL Patterns

# Direct reading - totally valid and useful
node_memory_MemAvailable_bytes

# Calculate percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Average, min, max over time
avg_over_time(node_load1[1h])
max_over_time(node_load1[1h])
min_over_time(node_load1[1h])

# Predict the future: "When will we run out of disk?"
predict_linear(node_filesystem_avail_bytes[6h], 3600 * 24)

# Rate of change (unusual for gauges, but useful for capacity planning)
deriv(node_memory_MemAvailable_bytes[5m])

# Find the top consumers
topk(5, node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
Enter fullscreen mode Exit fullscreen mode

Gauge Alerts That Actually Work

# "Memory is running low"
- alert: HighMemoryUsage
  expr: |
    (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Memory usage above 90% on {{ $labels.instance }}"

# "Disk will fill up in 24 hours" - this is the kind of proactive alert that makes SREs heroes
- alert: DiskFillingUp
  expr: |
    predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[6h], 24 * 3600) < 0
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Disk {{ $labels.mountpoint }} will fill within 24 hours"

# "Connection pool is almost exhausted"
- alert: ConnectionPoolNearExhaustion
  expr: db_pool_active_connections / db_pool_max_connections > 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Connection pool is 80% utilized"
Enter fullscreen mode Exit fullscreen mode

Histogram: The Distribution Detective

Now we get to the interesting ones. The Histogram is a detective who doesn't just count crimes; she categorizes them by severity and gives you the full picture.

"Out of 1000 requests," she reports, "150 completed in under 100ms, 700 completed in under 500ms, and 950 completed in under 1 second. The remaining 50 took longer."

This is the power of the Histogram. It doesn't just tell you the average. It shows you the distribution.

When to Use a Histogram

Histograms are perfect for:

  • Request latency (how long did API calls take?)
  • Response sizes
  • Any measurement where you need percentiles
  • When you need to aggregate percentiles across multiple pods (this is the killer feature)

How Histogram Buckets Work

Histogram Characteristics

Property Value
Components Three time series: _bucket, _sum, _count
Aggregation Fully aggregatable across instances (this is huge!)
Configuration Bucket boundaries must be defined upfront
Typical Suffix _seconds, _bytes

The Histogram's Secret: Buckets

Here's what a histogram actually creates behind the scenes:

http_request_duration_seconds_bucket{le="0.1"}   --> 150 requests were <= 100ms
http_request_duration_seconds_bucket{le="0.5"}   --> 700 requests were <= 500ms
http_request_duration_seconds_bucket{le="1"}     --> 950 requests were <= 1s
http_request_duration_seconds_bucket{le="+Inf"}  --> 1000 requests total
http_request_duration_seconds_sum                --> Total time spent (e.g., 423.7 seconds)
http_request_duration_seconds_count              --> Total count (1000)
Enter fullscreen mode Exit fullscreen mode

The le label means "less than or equal to." Buckets are cumulative.

Talking to the Histogram: PromQL Patterns

# Calculate the 50th percentile (median)
histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[5m]))

# Calculate p99 latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# P99 latency per endpoint (aggregated correctly!)
histogram_quantile(0.99, 
  sum by (le, endpoint) (rate(http_request_duration_seconds_bucket[5m]))
)

# Average request duration (simpler alternative)
rate(http_request_duration_seconds_sum[5m]) 
/ 
rate(http_request_duration_seconds_count[5m])

# "What percentage of requests complete in under 500ms?" (Apdex-style)
sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m]))
/
sum(rate(http_request_duration_seconds_count[5m])) * 100
Enter fullscreen mode Exit fullscreen mode

Histogram Alerts That Actually Work

# "P99 latency is too high"
- alert: HighP99Latency
  expr: |
    histogram_quantile(0.99, 
      sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))
    ) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "P99 latency exceeds 2 seconds for {{ $labels.service }}"

# "Latency doubled compared to an hour ago"
- alert: LatencyDegradation
  expr: |
    histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
    >
    histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m] offset 1h))) * 2
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "P95 latency is 2x higher than 1 hour ago"

# SLO violation: "Less than 99% of requests are fast"
- alert: SLOViolation
  expr: |
    sum(rate(http_request_duration_seconds_bucket{le="0.5"}[30m]))
    /
    sum(rate(http_request_duration_seconds_count[30m])) < 0.99
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "SLO Violation: Less than 99% of requests complete within 500ms"
Enter fullscreen mode Exit fullscreen mode

Summary: The Solo Performer

The Summary is the Histogram's cousin. She can also give you percentiles, but with one crucial difference: she calculates them herself, on the client side.

This makes her fast and precise for a single instance. But here's the catch: she can't collaborate. If you have 10 pods running, you cannot simply combine their percentiles to get a global percentile. Averaging p99s does not give you the true p99. It's mathematically wrong.

⚠️ The Summary Trap: I've seen teams spend hours debugging "wrong" percentiles, only to discover they were accidentally averaging Summary quantiles across instances. Don't be that team. If you need to aggregate, use Histograms.

When to Use a Summary

Summaries are appropriate when:

  • You genuinely only care about a single instance
  • You don't know bucket boundaries ahead of time
  • You're maintaining legacy code (most new projects should use Histograms)

Summary Characteristics

Property Value
Components Pre-calculated quantiles, plus _sum and _count
Aggregation Cannot aggregate quantiles (only sum/count)
Percentile Calculation Done on the client side
Typical Suffix _seconds, _bytes

Talking to the Summary: PromQL Patterns

# Read quantiles directly (only meaningful per-instance)
http_request_duration_seconds{quantile="0.99"}

# Average latency - this DOES work across instances!
sum(rate(http_request_duration_seconds_sum[5m])) 
/ 
sum(rate(http_request_duration_seconds_count[5m]))

# DON'T DO THIS - averaging quantiles is mathematically wrong
# avg(http_request_duration_seconds{quantile="0.99"})

# If you must look at quantiles, do it per-instance
http_request_duration_seconds{quantile="0.99", instance="pod-1:8080"}
Enter fullscreen mode Exit fullscreen mode

Comparison Matrix

Feature Counter Gauge Histogram Summary
Direction Only up ⬆️ Up and down ↕️ N/A N/A
Raw value useful ❌ No ✅ Yes ❌ No Partial
Use rate() Required Rare On buckets On sum/count
Aggregatable ✅ Yes ✅ Yes ✅ Yes ⚠️ Only sum/count
Percentiles ❌ No ❌ No ✅ Server-side ✅ Client-side
Storage cost Low Low Higher Medium

PromQL Functions by Metric Type

Function Counter Gauge Histogram Summary
rate() ✅ Primary ❌ No ✅ On buckets ✅ On sum/count
irate() ✅ Yes ❌ No ✅ Yes ✅ Yes
increase() ✅ Yes ❌ No ✅ Yes ✅ Yes
deriv() ❌ No ✅ Yes ❌ No ❌ No
delta() ❌ No ✅ Yes ❌ No ❌ No
predict_linear() ❌ No ✅ Yes ❌ No ❌ No
histogram_quantile() ❌ No ❌ No ✅ Required ❌ No

Alerting Strategies

The Golden Signals

Google's SRE book teaches us to monitor four things. Here's how metric types map to them:

# 1. LATENCY (Histogram) - "How long do things take?"
- alert: HighLatency
  expr: histogram_quantile(0.99, sum by (le) (rate(http_duration_seconds_bucket[5m]))) > 1

# 2. TRAFFIC (Counter) - "How much are we doing?"
- alert: TrafficAnomaly
  expr: |
    abs(sum(rate(http_requests_total[5m])) - sum(rate(http_requests_total[5m] offset 1w)))
    / sum(rate(http_requests_total[5m] offset 1w)) > 0.5

# 3. ERRORS (Counter) - "How often do things fail?"
- alert: HighErrorRate
  expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.01

# 4. SATURATION (Gauge) - "How full is our system?"
- alert: HighSaturation
  expr: avg by (instance) (1 - rate(node_cpu_seconds_total{mode="idle"}[5m])) > 0.9
Enter fullscreen mode Exit fullscreen mode

SLO-Based Multi-Burn Rate Alerts

For the more advanced: burn rate alerts that catch both fast and slow burns of your error budget.

# Fast burn: 2% of monthly error budget consumed in 1 hour
- alert: SLOFastBurn
  expr: |
    (sum(rate(http_requests_total{status=~"5.."}[1h])) / sum(rate(http_requests_total[1h])) > 14.4 * 0.001)
    and
    (sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 14.4 * 0.001)
  labels:
    severity: critical

# Slow burn: Steady consumption over days
- alert: SLOSlowBurn
  expr: |
    (sum(rate(http_requests_total{status=~"5.."}[6h])) / sum(rate(http_requests_total[6h])) > 1 * 0.001)
    and
    (sum(rate(http_requests_total{status=~"5.."}[3h])) / sum(rate(http_requests_total[3h])) > 1 * 0.001)
  labels:
    severity: warning
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Quick Reference

When things go wrong at 3 AM, use this table:

General Issues (All Metric Types)

Symptom Likely Cause Fix Debug Query
No data at all Target not scraped Check target status up{job="my-service"}
Gaps in graph Scrape failures Check scrape duration scrape_duration_seconds{job="..."}
Too many series High cardinality Add label filters topk(10, count by (__name__)({__name__!=""}))

Counter Issues

Symptom Likely Cause Fix
Flat line No events occurring Check application logic
Sudden drops Counter reset Use rate() (it handles resets)
Negative rate Label churn Check for recreated series

Gauge Issues

Symptom Likely Cause Fix
Value unchanged Stale metric Check scrape status
Noisy graph High variance Use avg_over_time()
Wrong scale Unit mismatch Check metric units

Histogram Issues

Symptom Likely Cause Fix
Wrong percentile Bad bucket boundaries Add more buckets
Most values in +Inf Buckets too small Increase upper bounds
NaN result No samples Increase time window

Summary Issues

Symptom Likely Cause Fix
Wrong global p99 Averaged quantiles Switch to Histogram

The Cardinality Monster

Let me tell you about the monster that has brought down more Prometheus instances than any other: cardinality.

Cardinality is the number of unique time series in your system. And it can explode faster than you think.

Cardinality Explosion

How Cardinality Explodes

Every unique combination of labels creates a new time series:

1 metric × 5 methods × 10 status codes × 100 endpoints × 50 instances
= 250,000 time series from ONE metric
Enter fullscreen mode Exit fullscreen mode

Labels That Will Destroy Your Prometheus

Never use these as labels:

Label Type Example Why It's Bad
User IDs user_id="12345" Millions of values
Request IDs request_id="abc-123" One per request
Timestamps timestamp="2024-01-01" Infinite growth
IP addresses client_ip="192.168.1.1" Thousands of values
Session tokens session="..." One per session
Error messages error="Connection refused..." Unbounded strings

Detecting the Monster

# How bad is it? Count all series.
count({__name__!=""})

# Find the offenders
topk(10, count by (__name__) ({__name__!=""}))

# Check per-label cardinality
count by (endpoint) (http_requests_total)
Enter fullscreen mode Exit fullscreen mode

Cardinality Guidelines

Level Series Count Action
🟢 Low Under 1,000 You're fine
🟡 Moderate 1K - 10K Monitor it
🟠 High 10K - 100K Investigate
🔴 Critical Over 100K Fix immediately

Best Practices

Do These Things

  1. Always use rate() with counters - Raw values are useless
  2. Set rate window to 2-4x scrape interval - Ensures enough data points
  3. Include le in your by clause before histogram_quantile()
  4. Use histograms for percentiles - They aggregate correctly
  5. Add for duration to alerts - Prevents flapping
  6. Define bucket boundaries based on SLOs - Know what matters

Avoid These Mistakes

  1. Averaging summary quantiles - Mathematically wrong
  2. Using irate() for alerting - Too volatile
  3. Alerting on raw gauge spikes - Use for duration
  4. High cardinality labels - They'll kill your Prometheus
  5. avg_over_time(rate(...)) - Just use a larger rate window

References

  1. Prometheus Official Documentation: Metric Types
  2. Prometheus Official Documentation: Querying Basics
  3. Prometheus Official Documentation: Querying Functions
  4. Prometheus Official Documentation: Alerting Rules
  5. Google SRE Book: Monitoring Distributed Systems
  6. Google SRE Workbook: Alerting on SLOs
  7. Robust Perception: How does a Prometheus Histogram work?
  8. Robust Perception: How does a Prometheus Summary work?
  9. Dash0: Understanding the Prometheus Metric Types
  10. Better Stack: Prometheus Metrics Explained

Conclusion

So here we are. It's 4:15 AM, but you're no longer panicking.

You know that the Counter is your reliable bookkeeper, always tallying but never forgetting. You query her with rate().

You know that the Gauge is your live reporter, giving you the current state. Her raw values make sense.

You know that the Histogram is your distribution detective, revealing the patterns in your latency. She aggregates correctly across all your pods.

And you know to be careful with the Summary, the solo performer who can't collaborate across instances.

Most importantly, you've learned to respect the Cardinality Monster and keep him caged.

The pager may buzz again. But next time, you'll know exactly what you're looking at.

Now go get some sleep. You've earned it.

Top comments (0)