Sunny Nazar for AWS Community Builders

Posted on Jan 11

The Complete Guide to Prometheus Metric Types

#prometheus #monitoring #devops #observability

The Complete Guide to Prometheus Metric Types: PromQL, Alerting and Troubleshooting

Reading Time: 15 minutes

The 3 AM Call
Quick Reference Card
Which Metric Type Should I Use
Meet the Four Metric Types
Comparison Matrix
PromQL Functions by Metric Type
Alerting Strategies
Troubleshooting Quick Reference
The Cardinality Monster
Best Practices
References
Conclusion

The 3 AM Call

It's 3:17 AM. Your phone buzzes violently on the nightstand.

You grab it with one eye open. PagerDuty. Of course.

"CRITICAL: API latency exceeds threshold"

You stumble to your laptop, coffee-less and bleary-eyed. Grafana loads. The dashboard is a mess of red lines spiking upward. Your mind races: Is this a traffic spike? A memory leak? Did someone deploy something?

You stare at the metrics. http_requests_total is climbing. process_resident_memory_bytes looks normal. But wait... what does that histogram actually mean? Why is the p99 showing NaN? And why on earth did someone create a metric with user_id as a label?

Sound familiar?

This guide exists because I've been there. We've all been there. And the truth is, most Prometheus pain comes down to one thing: not fully understanding the four metric types.

Let me introduce you to them. Think of them as four tools in your observability toolkit. Each has a job. Each has rules. Use the wrong one, and you'll be back at 3 AM wondering why your alerts are lying to you.

Let's fix that.

Quick Reference Card

Need a quick answer? Start here.

Metric Type	Best For	Key Function	Suffix	Can Aggregate?
Counter	Totals (requests, errors, bytes)	`rate()`	`_total`	✅ Yes
Gauge	Current state (memory, CPU)	Raw value	None	✅ Yes
Histogram	Latency distributions	`histogram_quantile()`	`_seconds`	✅ Yes
Summary	Per-instance percentiles	Direct read	`_seconds`	⚠️ Only sum/count

The Essential Queries You'll Use Every Day

# Counter: "How many requests per second are we getting?"
rate(http_requests_total[5m])

# Gauge: "How much memory are we using right now?"
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Histogram: "What's our p99 latency across all pods?"
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

# Summary: "What's the average latency?" (works across instances, unlike quantiles)
sum(rate(http_request_duration_seconds_sum[5m])) / sum(rate(http_request_duration_seconds_count[5m]))

Which Metric Type Should I Use

Before diving into the details, let me save you some time. Here's a decision flowchart that I wish someone had shown me years ago:

The Quick Decision Table

If you would say...	Use
"How many X happened?"	Counter
"What is the current X?"	Gauge
"What's the p99 latency across all pods?"	Histogram
"What's the p99 on this specific pod?"	Summary

Now let me tell you the stories behind each of these tools.

Meet the Four Metric Types

Counter: The Tireless Bookkeeper

Picture a diligent accountant who sits at the entrance of your application. Every time a request comes in, she makes a tally mark. Every error? Another tally. Bytes transferred? She counts them all.

The Counter never forgets. She never erases. Her numbers only go up. The only time they reset is when she goes home for the night (your process restarts).

The Counter's Personality

A Counter is a cumulative metric that only increases. Think of it as an odometer in your car. The number only goes up. You don't care about the current number per se; you care about how fast it's changing.

This is the crucial insight: raw counter values are almost useless. What you want is the rate.

When to Use a Counter

Counters thrive when tracking:

Total HTTP requests received
Bytes sent over the network
Errors encountered
Background jobs completed
Messages processed from a queue

Counter Characteristics

Property	Value
Direction	Only goes up (monotonically increasing)
Reset Behavior	Resets to 0 when the process restarts
Typical Suffix	`_total`
Raw Value Usefulness	Low (always use `rate()` or `increase()`)

Talking to the Counter: PromQL Patterns

# The WRONG way: Raw value tells you nothing useful
http_requests_total

# The RIGHT way: Rate of requests per second over 5 minutes
rate(http_requests_total[5m])

# Filter by label (e.g., only 500 errors)
rate(http_requests_total{status="500"}[5m])

# Total increase over the last hour
increase(http_requests_total[1h])

# Sum rates across all instances
sum(rate(http_requests_total[5m]))

# Group by HTTP method
sum by (method) (rate(http_requests_total[5m]))

# The money query: Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) 
/ 
sum(rate(http_requests_total[5m])) * 100

Counter Alerts That Actually Work

# "Our error rate is too high"
- alert: HighErrorRate
  expr: |
    sum(rate(http_requests_total{status=~"5.."}[5m])) 
    / 
    sum(rate(http_requests_total[5m])) > 0.05
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Error rate exceeds 5%"

# "Traffic dropped suddenly - possible outage"
- alert: TrafficDrop
  expr: |
    sum(rate(http_requests_total[5m])) 
    < 
    sum(rate(http_requests_total[5m] offset 1h)) * 0.5
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Traffic dropped by more than 50% compared to 1 hour ago"

# "We're getting zero requests - something is very wrong"
- alert: NoTraffic
  expr: sum(rate(http_requests_total[5m])) == 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "No HTTP requests received in the last 5 minutes"

Gauge: The Live Reporter

If the Counter is an accountant tallying historical records, the Gauge is a live news reporter telling you what's happening right now.

"Memory usage is at 78%!" she reports. A moment later: "It dropped to 72%!" Unlike the Counter, the Gauge's numbers go up and down. She reflects the current state of the world.

The Gauge's Personality

A Gauge represents a single numerical value that can arbitrarily go up and down. It's a snapshot of reality at any moment. Think of a thermometer, a fuel gauge, or your current queue depth.

The beautiful thing about gauges? The raw value is immediately meaningful. When someone asks "How much memory are we using?", the gauge has the answer.

When to Use a Gauge

Gauges excel at:

Current memory or CPU usage
Number of active connections
Queue depth
Temperature readings
Number of goroutines running
Disk space remaining

Gauge Characteristics

Property	Value
Direction	Can increase or decrease
Reset Behavior	Not applicable (always reflects current state)
Typical Suffix	None specific
Raw Value Usefulness	High (the current value is what you want)

Talking to the Gauge: PromQL Patterns

# Direct reading - totally valid and useful
node_memory_MemAvailable_bytes

# Calculate percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Average, min, max over time
avg_over_time(node_load1[1h])
max_over_time(node_load1[1h])
min_over_time(node_load1[1h])

# Predict the future: "When will we run out of disk?"
predict_linear(node_filesystem_avail_bytes[6h], 3600 * 24)

# Rate of change (unusual for gauges, but useful for capacity planning)
deriv(node_memory_MemAvailable_bytes[5m])

# Find the top consumers
topk(5, node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)

Gauge Alerts That Actually Work

# "Memory is running low"
- alert: HighMemoryUsage
  expr: |
    (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Memory usage above 90% on {{ $labels.instance }}"

# "Disk will fill up in 24 hours" - this is the kind of proactive alert that makes SREs heroes
- alert: DiskFillingUp
  expr: |
    predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[6h], 24 * 3600) < 0
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Disk {{ $labels.mountpoint }} will fill within 24 hours"

# "Connection pool is almost exhausted"
- alert: ConnectionPoolNearExhaustion
  expr: db_pool_active_connections / db_pool_max_connections > 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Connection pool is 80% utilized"

Histogram: The Distribution Detective

Now we get to the interesting ones. The Histogram is a detective who doesn't just count crimes; she categorizes them by severity and gives you the full picture.

"Out of 1000 requests," she reports, "150 completed in under 100ms, 700 completed in under 500ms, and 950 completed in under 1 second. The remaining 50 took longer."

This is the power of the Histogram. It doesn't just tell you the average. It shows you the distribution.

When to Use a Histogram

Histograms are perfect for:

Request latency (how long did API calls take?)
Response sizes
Any measurement where you need percentiles
When you need to aggregate percentiles across multiple pods (this is the killer feature)

Histogram Characteristics

Property	Value
Components	Three time series: `_bucket`, `_sum`, `_count`
Aggregation	Fully aggregatable across instances (this is huge!)
Configuration	Bucket boundaries must be defined upfront
Typical Suffix	`_seconds`, `_bytes`

The Histogram's Secret: Buckets

Here's what a histogram actually creates behind the scenes:

http_request_duration_seconds_bucket{le="0.1"}   --> 150 requests were <= 100ms
http_request_duration_seconds_bucket{le="0.5"}   --> 700 requests were <= 500ms
http_request_duration_seconds_bucket{le="1"}     --> 950 requests were <= 1s
http_request_duration_seconds_bucket{le="+Inf"}  --> 1000 requests total
http_request_duration_seconds_sum                --> Total time spent (e.g., 423.7 seconds)
http_request_duration_seconds_count              --> Total count (1000)

The le label means "less than or equal to." Buckets are cumulative.

Talking to the Histogram: PromQL Patterns

# Calculate the 50th percentile (median)
histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[5m]))

# Calculate p99 latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

# P99 latency per endpoint (aggregated correctly!)
histogram_quantile(0.99, 
  sum by (le, endpoint) (rate(http_request_duration_seconds_bucket[5m]))
)

# Average request duration (simpler alternative)
rate(http_request_duration_seconds_sum[5m]) 
/ 
rate(http_request_duration_seconds_count[5m])

# "What percentage of requests complete in under 500ms?" (Apdex-style)
sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m]))
/
sum(rate(http_request_duration_seconds_count[5m])) * 100

Histogram Alerts That Actually Work

# "P99 latency is too high"
- alert: HighP99Latency
  expr: |
    histogram_quantile(0.99, 
      sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))
    ) > 2
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "P99 latency exceeds 2 seconds for {{ $labels.service }}"

# "Latency doubled compared to an hour ago"
- alert: LatencyDegradation
  expr: |
    histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
    >
    histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m] offset 1h))) * 2
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "P95 latency is 2x higher than 1 hour ago"

# SLO violation: "Less than 99% of requests are fast"
- alert: SLOViolation
  expr: |
    sum(rate(http_request_duration_seconds_bucket{le="0.5"}[30m]))
    /
    sum(rate(http_request_duration_seconds_count[30m])) < 0.99
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "SLO Violation: Less than 99% of requests complete within 500ms"

Summary: The Solo Performer

The Summary is the Histogram's cousin. She can also give you percentiles, but with one crucial difference: she calculates them herself, on the client side.

This makes her fast and precise for a single instance. But here's the catch: she can't collaborate. If you have 10 pods running, you cannot simply combine their percentiles to get a global percentile. Averaging p99s does not give you the true p99. It's mathematically wrong.

⚠️ The Summary Trap: I've seen teams spend hours debugging "wrong" percentiles, only to discover they were accidentally averaging Summary quantiles across instances. Don't be that team. If you need to aggregate, use Histograms.

When to Use a Summary

Summaries are appropriate when:

You genuinely only care about a single instance
You don't know bucket boundaries ahead of time
You're maintaining legacy code (most new projects should use Histograms)

Summary Characteristics

Property	Value
Components	Pre-calculated quantiles, plus `_sum` and `_count`
Aggregation	Cannot aggregate quantiles (only sum/count)
Percentile Calculation	Done on the client side
Typical Suffix	`_seconds`, `_bytes`

Talking to the Summary: PromQL Patterns

# Read quantiles directly (only meaningful per-instance)
http_request_duration_seconds{quantile="0.99"}

# Average latency - this DOES work across instances!
sum(rate(http_request_duration_seconds_sum[5m])) 
/ 
sum(rate(http_request_duration_seconds_count[5m]))

# DON'T DO THIS - averaging quantiles is mathematically wrong
# avg(http_request_duration_seconds{quantile="0.99"})

# If you must look at quantiles, do it per-instance
http_request_duration_seconds{quantile="0.99", instance="pod-1:8080"}

Comparison Matrix

Feature	Counter	Gauge	Histogram	Summary
Direction	Only up ⬆️	Up and down ↕️	N/A	N/A
Raw value useful	❌ No	✅ Yes	❌ No	Partial
Use rate()	Required	Rare	On buckets	On sum/count
Aggregatable	✅ Yes	✅ Yes	✅ Yes	⚠️ Only sum/count
Percentiles	❌ No	❌ No	✅ Server-side	✅ Client-side
Storage cost	Low	Low	Higher	Medium

PromQL Functions by Metric Type

Function	Counter	Gauge	Histogram	Summary
`rate()`	✅ Primary	❌ No	✅ On buckets	✅ On sum/count
`irate()`	✅ Yes	❌ No	✅ Yes	✅ Yes
`increase()`	✅ Yes	❌ No	✅ Yes	✅ Yes
`deriv()`	❌ No	✅ Yes	❌ No	❌ No
`delta()`	❌ No	✅ Yes	❌ No	❌ No
`predict_linear()`	❌ No	✅ Yes	❌ No	❌ No
`histogram_quantile()`	❌ No	❌ No	✅ Required	❌ No

Alerting Strategies

The Golden Signals

Google's SRE book teaches us to monitor four things. Here's how metric types map to them:

# 1. LATENCY (Histogram) - "How long do things take?"
- alert: HighLatency
  expr: histogram_quantile(0.99, sum by (le) (rate(http_duration_seconds_bucket[5m]))) > 1

# 2. TRAFFIC (Counter) - "How much are we doing?"
- alert: TrafficAnomaly
  expr: |
    abs(sum(rate(http_requests_total[5m])) - sum(rate(http_requests_total[5m] offset 1w)))
    / sum(rate(http_requests_total[5m] offset 1w)) > 0.5

# 3. ERRORS (Counter) - "How often do things fail?"
- alert: HighErrorRate
  expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.01

# 4. SATURATION (Gauge) - "How full is our system?"
- alert: HighSaturation
  expr: avg by (instance) (1 - rate(node_cpu_seconds_total{mode="idle"}[5m])) > 0.9

SLO-Based Multi-Burn Rate Alerts

For the more advanced: burn rate alerts that catch both fast and slow burns of your error budget.

# Fast burn: 2% of monthly error budget consumed in 1 hour
- alert: SLOFastBurn
  expr: |
    (sum(rate(http_requests_total{status=~"5.."}[1h])) / sum(rate(http_requests_total[1h])) > 14.4 * 0.001)
    and
    (sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 14.4 * 0.001)
  labels:
    severity: critical

# Slow burn: Steady consumption over days
- alert: SLOSlowBurn
  expr: |
    (sum(rate(http_requests_total{status=~"5.."}[6h])) / sum(rate(http_requests_total[6h])) > 1 * 0.001)
    and
    (sum(rate(http_requests_total{status=~"5.."}[3h])) / sum(rate(http_requests_total[3h])) > 1 * 0.001)
  labels:
    severity: warning

Troubleshooting Quick Reference

When things go wrong at 3 AM, use this table:

General Issues (All Metric Types)

Symptom	Likely Cause	Fix	Debug Query
No data at all	Target not scraped	Check target status	`up{job="my-service"}`
Gaps in graph	Scrape failures	Check scrape duration	`scrape_duration_seconds{job="..."}`
Too many series	High cardinality	Add label filters	`topk(10, count by (__name__)({__name__!=""}))`

Counter Issues

Symptom	Likely Cause	Fix
Flat line	No events occurring	Check application logic
Sudden drops	Counter reset	Use `rate()` (it handles resets)
Negative rate	Label churn	Check for recreated series

Gauge Issues

Symptom	Likely Cause	Fix
Value unchanged	Stale metric	Check scrape status
Noisy graph	High variance	Use `avg_over_time()`
Wrong scale	Unit mismatch	Check metric units

Histogram Issues

Symptom	Likely Cause	Fix
Wrong percentile	Bad bucket boundaries	Add more buckets
Most values in +Inf	Buckets too small	Increase upper bounds
NaN result	No samples	Increase time window

Summary Issues

Symptom	Likely Cause	Fix
Wrong global p99	Averaged quantiles	Switch to Histogram

The Cardinality Monster

Let me tell you about the monster that has brought down more Prometheus instances than any other: cardinality.

Cardinality is the number of unique time series in your system. And it can explode faster than you think.

How Cardinality Explodes

Every unique combination of labels creates a new time series:

1 metric × 5 methods × 10 status codes × 100 endpoints × 50 instances
= 250,000 time series from ONE metric

Labels That Will Destroy Your Prometheus

Never use these as labels:

Label Type	Example	Why It's Bad
User IDs	`user_id="12345"`	Millions of values
Request IDs	`request_id="abc-123"`	One per request
Timestamps	`timestamp="2024-01-01"`	Infinite growth
IP addresses	`client_ip="192.168.1.1"`	Thousands of values
Session tokens	`session="..."`	One per session
Error messages	`error="Connection refused..."`	Unbounded strings

Detecting the Monster

# How bad is it? Count all series.
count({__name__!=""})

# Find the offenders
topk(10, count by (__name__) ({__name__!=""}))

# Check per-label cardinality
count by (endpoint) (http_requests_total)

Cardinality Guidelines

Level	Series Count	Action
🟢 Low	Under 1,000	You're fine
🟡 Moderate	1K - 10K	Monitor it
🟠 High	10K - 100K	Investigate
🔴 Critical	Over 100K	Fix immediately

Best Practices

Do These Things

Always use rate() with counters - Raw values are useless
Set rate window to 2-4x scrape interval - Ensures enough data points
Include le in your by clause before histogram_quantile()
Use histograms for percentiles - They aggregate correctly
Add for duration to alerts - Prevents flapping
Define bucket boundaries based on SLOs - Know what matters

Avoid These Mistakes

Averaging summary quantiles - Mathematically wrong
Using irate() for alerting - Too volatile
Alerting on raw gauge spikes - Use for duration
High cardinality labels - They'll kill your Prometheus
avg_over_time(rate(...)) - Just use a larger rate window

References

Conclusion

So here we are. It's 4:15 AM, but you're no longer panicking.

You know that the Counter is your reliable bookkeeper, always tallying but never forgetting. You query her with rate().

You know that the Gauge is your live reporter, giving you the current state. Her raw values make sense.

You know that the Histogram is your distribution detective, revealing the patterns in your latency. She aggregates correctly across all your pods.

And you know to be careful with the Summary, the solo performer who can't collaborate across instances.

Most importantly, you've learned to respect the Cardinality Monster and keep him caged.

The pager may buzz again. But next time, you'll know exactly what you're looking at.

Now go get some sleep. You've earned it.