The Complete Guide to Prometheus Metric Types: PromQL, Alerting and Troubleshooting
Reading Time: 15 minutes
Table of Contents
- The 3 AM Call
- Quick Reference Card
- Which Metric Type Should I Use
- Meet the Four Metric Types
- Comparison Matrix
- PromQL Functions by Metric Type
- Alerting Strategies
- Troubleshooting Quick Reference
- The Cardinality Monster
- Best Practices
- References
- Conclusion
The 3 AM Call
It's 3:17 AM. Your phone buzzes violently on the nightstand.
You grab it with one eye open. PagerDuty. Of course.
"CRITICAL: API latency exceeds threshold"
You stumble to your laptop, coffee-less and bleary-eyed. Grafana loads. The dashboard is a mess of red lines spiking upward. Your mind races: Is this a traffic spike? A memory leak? Did someone deploy something?
You stare at the metrics. http_requests_total is climbing. process_resident_memory_bytes looks normal. But wait... what does that histogram actually mean? Why is the p99 showing NaN? And why on earth did someone create a metric with user_id as a label?
Sound familiar?
This guide exists because I've been there. We've all been there. And the truth is, most Prometheus pain comes down to one thing: not fully understanding the four metric types.
Let me introduce you to them. Think of them as four tools in your observability toolkit. Each has a job. Each has rules. Use the wrong one, and you'll be back at 3 AM wondering why your alerts are lying to you.
Let's fix that.
Quick Reference Card
Need a quick answer? Start here.
| Metric Type | Best For | Key Function | Suffix | Can Aggregate? |
|---|---|---|---|---|
| Counter | Totals (requests, errors, bytes) | rate() |
_total |
✅ Yes |
| Gauge | Current state (memory, CPU) | Raw value | None | ✅ Yes |
| Histogram | Latency distributions | histogram_quantile() |
_seconds |
✅ Yes |
| Summary | Per-instance percentiles | Direct read | _seconds |
⚠️ Only sum/count |
The Essential Queries You'll Use Every Day
# Counter: "How many requests per second are we getting?"
rate(http_requests_total[5m])
# Gauge: "How much memory are we using right now?"
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Histogram: "What's our p99 latency across all pods?"
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
# Summary: "What's the average latency?" (works across instances, unlike quantiles)
sum(rate(http_request_duration_seconds_sum[5m])) / sum(rate(http_request_duration_seconds_count[5m]))
Which Metric Type Should I Use
Before diving into the details, let me save you some time. Here's a decision flowchart that I wish someone had shown me years ago:
| If you would say... | Use |
|---|---|
| "How many X happened?" | Counter |
| "What is the current X?" | Gauge |
| "What's the p99 latency across all pods?" | Histogram |
| "What's the p99 on this specific pod?" | Summary |
Now let me tell you the stories behind each of these tools.
Meet the Four Metric Types
Counter: The Tireless Bookkeeper
Picture a diligent accountant who sits at the entrance of your application. Every time a request comes in, she makes a tally mark. Every error? Another tally. Bytes transferred? She counts them all.
The Counter never forgets. She never erases. Her numbers only go up. The only time they reset is when she goes home for the night (your process restarts).
The Counter's Personality
A Counter is a cumulative metric that only increases. Think of it as an odometer in your car. The number only goes up. You don't care about the current number per se; you care about how fast it's changing.
This is the crucial insight: raw counter values are almost useless. What you want is the rate.
When to Use a Counter
Counters thrive when tracking:
- Total HTTP requests received
- Bytes sent over the network
- Errors encountered
- Background jobs completed
- Messages processed from a queue
Counter Characteristics
| Property | Value |
|---|---|
| Direction | Only goes up (monotonically increasing) |
| Reset Behavior | Resets to 0 when the process restarts |
| Typical Suffix | _total |
| Raw Value Usefulness | Low (always use rate() or increase()) |
Talking to the Counter: PromQL Patterns
# The WRONG way: Raw value tells you nothing useful
http_requests_total
# The RIGHT way: Rate of requests per second over 5 minutes
rate(http_requests_total[5m])
# Filter by label (e.g., only 500 errors)
rate(http_requests_total{status="500"}[5m])
# Total increase over the last hour
increase(http_requests_total[1h])
# Sum rates across all instances
sum(rate(http_requests_total[5m]))
# Group by HTTP method
sum by (method) (rate(http_requests_total[5m]))
# The money query: Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) * 100
Counter Alerts That Actually Work
# "Our error rate is too high"
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate exceeds 5%"
# "Traffic dropped suddenly - possible outage"
- alert: TrafficDrop
expr: |
sum(rate(http_requests_total[5m]))
<
sum(rate(http_requests_total[5m] offset 1h)) * 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Traffic dropped by more than 50% compared to 1 hour ago"
# "We're getting zero requests - something is very wrong"
- alert: NoTraffic
expr: sum(rate(http_requests_total[5m])) == 0
for: 5m
labels:
severity: critical
annotations:
summary: "No HTTP requests received in the last 5 minutes"
Gauge: The Live Reporter
If the Counter is an accountant tallying historical records, the Gauge is a live news reporter telling you what's happening right now.
"Memory usage is at 78%!" she reports. A moment later: "It dropped to 72%!" Unlike the Counter, the Gauge's numbers go up and down. She reflects the current state of the world.
The Gauge's Personality
A Gauge represents a single numerical value that can arbitrarily go up and down. It's a snapshot of reality at any moment. Think of a thermometer, a fuel gauge, or your current queue depth.
The beautiful thing about gauges? The raw value is immediately meaningful. When someone asks "How much memory are we using?", the gauge has the answer.
When to Use a Gauge
Gauges excel at:
- Current memory or CPU usage
- Number of active connections
- Queue depth
- Temperature readings
- Number of goroutines running
- Disk space remaining
Gauge Characteristics
| Property | Value |
|---|---|
| Direction | Can increase or decrease |
| Reset Behavior | Not applicable (always reflects current state) |
| Typical Suffix | None specific |
| Raw Value Usefulness | High (the current value is what you want) |
Talking to the Gauge: PromQL Patterns
# Direct reading - totally valid and useful
node_memory_MemAvailable_bytes
# Calculate percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Average, min, max over time
avg_over_time(node_load1[1h])
max_over_time(node_load1[1h])
min_over_time(node_load1[1h])
# Predict the future: "When will we run out of disk?"
predict_linear(node_filesystem_avail_bytes[6h], 3600 * 24)
# Rate of change (unusual for gauges, but useful for capacity planning)
deriv(node_memory_MemAvailable_bytes[5m])
# Find the top consumers
topk(5, node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
Gauge Alerts That Actually Work
# "Memory is running low"
- alert: HighMemoryUsage
expr: |
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage above 90% on {{ $labels.instance }}"
# "Disk will fill up in 24 hours" - this is the kind of proactive alert that makes SREs heroes
- alert: DiskFillingUp
expr: |
predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[6h], 24 * 3600) < 0
for: 1h
labels:
severity: warning
annotations:
summary: "Disk {{ $labels.mountpoint }} will fill within 24 hours"
# "Connection pool is almost exhausted"
- alert: ConnectionPoolNearExhaustion
expr: db_pool_active_connections / db_pool_max_connections > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Connection pool is 80% utilized"
Histogram: The Distribution Detective
Now we get to the interesting ones. The Histogram is a detective who doesn't just count crimes; she categorizes them by severity and gives you the full picture.
"Out of 1000 requests," she reports, "150 completed in under 100ms, 700 completed in under 500ms, and 950 completed in under 1 second. The remaining 50 took longer."
This is the power of the Histogram. It doesn't just tell you the average. It shows you the distribution.
When to Use a Histogram
Histograms are perfect for:
- Request latency (how long did API calls take?)
- Response sizes
- Any measurement where you need percentiles
- When you need to aggregate percentiles across multiple pods (this is the killer feature)
Histogram Characteristics
| Property | Value |
|---|---|
| Components | Three time series: _bucket, _sum, _count
|
| Aggregation | Fully aggregatable across instances (this is huge!) |
| Configuration | Bucket boundaries must be defined upfront |
| Typical Suffix |
_seconds, _bytes
|
The Histogram's Secret: Buckets
Here's what a histogram actually creates behind the scenes:
http_request_duration_seconds_bucket{le="0.1"} --> 150 requests were <= 100ms
http_request_duration_seconds_bucket{le="0.5"} --> 700 requests were <= 500ms
http_request_duration_seconds_bucket{le="1"} --> 950 requests were <= 1s
http_request_duration_seconds_bucket{le="+Inf"} --> 1000 requests total
http_request_duration_seconds_sum --> Total time spent (e.g., 423.7 seconds)
http_request_duration_seconds_count --> Total count (1000)
The le label means "less than or equal to." Buckets are cumulative.
Talking to the Histogram: PromQL Patterns
# Calculate the 50th percentile (median)
histogram_quantile(0.5, rate(http_request_duration_seconds_bucket[5m]))
# Calculate p99 latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# P99 latency per endpoint (aggregated correctly!)
histogram_quantile(0.99,
sum by (le, endpoint) (rate(http_request_duration_seconds_bucket[5m]))
)
# Average request duration (simpler alternative)
rate(http_request_duration_seconds_sum[5m])
/
rate(http_request_duration_seconds_count[5m])
# "What percentage of requests complete in under 500ms?" (Apdex-style)
sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m]))
/
sum(rate(http_request_duration_seconds_count[5m])) * 100
Histogram Alerts That Actually Work
# "P99 latency is too high"
- alert: HighP99Latency
expr: |
histogram_quantile(0.99,
sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))
) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "P99 latency exceeds 2 seconds for {{ $labels.service }}"
# "Latency doubled compared to an hour ago"
- alert: LatencyDegradation
expr: |
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
>
histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m] offset 1h))) * 2
for: 10m
labels:
severity: warning
annotations:
summary: "P95 latency is 2x higher than 1 hour ago"
# SLO violation: "Less than 99% of requests are fast"
- alert: SLOViolation
expr: |
sum(rate(http_request_duration_seconds_bucket{le="0.5"}[30m]))
/
sum(rate(http_request_duration_seconds_count[30m])) < 0.99
for: 5m
labels:
severity: critical
annotations:
summary: "SLO Violation: Less than 99% of requests complete within 500ms"
Summary: The Solo Performer
The Summary is the Histogram's cousin. She can also give you percentiles, but with one crucial difference: she calculates them herself, on the client side.
This makes her fast and precise for a single instance. But here's the catch: she can't collaborate. If you have 10 pods running, you cannot simply combine their percentiles to get a global percentile. Averaging p99s does not give you the true p99. It's mathematically wrong.
⚠️ The Summary Trap: I've seen teams spend hours debugging "wrong" percentiles, only to discover they were accidentally averaging Summary quantiles across instances. Don't be that team. If you need to aggregate, use Histograms.
When to Use a Summary
Summaries are appropriate when:
- You genuinely only care about a single instance
- You don't know bucket boundaries ahead of time
- You're maintaining legacy code (most new projects should use Histograms)
Summary Characteristics
| Property | Value |
|---|---|
| Components | Pre-calculated quantiles, plus _sum and _count
|
| Aggregation | Cannot aggregate quantiles (only sum/count) |
| Percentile Calculation | Done on the client side |
| Typical Suffix |
_seconds, _bytes
|
Talking to the Summary: PromQL Patterns
# Read quantiles directly (only meaningful per-instance)
http_request_duration_seconds{quantile="0.99"}
# Average latency - this DOES work across instances!
sum(rate(http_request_duration_seconds_sum[5m]))
/
sum(rate(http_request_duration_seconds_count[5m]))
# DON'T DO THIS - averaging quantiles is mathematically wrong
# avg(http_request_duration_seconds{quantile="0.99"})
# If you must look at quantiles, do it per-instance
http_request_duration_seconds{quantile="0.99", instance="pod-1:8080"}
Comparison Matrix
| Feature | Counter | Gauge | Histogram | Summary |
|---|---|---|---|---|
| Direction | Only up ⬆️ | Up and down ↕️ | N/A | N/A |
| Raw value useful | ❌ No | ✅ Yes | ❌ No | Partial |
| Use rate() | Required | Rare | On buckets | On sum/count |
| Aggregatable | ✅ Yes | ✅ Yes | ✅ Yes | ⚠️ Only sum/count |
| Percentiles | ❌ No | ❌ No | ✅ Server-side | ✅ Client-side |
| Storage cost | Low | Low | Higher | Medium |
PromQL Functions by Metric Type
| Function | Counter | Gauge | Histogram | Summary |
|---|---|---|---|---|
rate() |
✅ Primary | ❌ No | ✅ On buckets | ✅ On sum/count |
irate() |
✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
increase() |
✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
deriv() |
❌ No | ✅ Yes | ❌ No | ❌ No |
delta() |
❌ No | ✅ Yes | ❌ No | ❌ No |
predict_linear() |
❌ No | ✅ Yes | ❌ No | ❌ No |
histogram_quantile() |
❌ No | ❌ No | ✅ Required | ❌ No |
Alerting Strategies
The Golden Signals
Google's SRE book teaches us to monitor four things. Here's how metric types map to them:
# 1. LATENCY (Histogram) - "How long do things take?"
- alert: HighLatency
expr: histogram_quantile(0.99, sum by (le) (rate(http_duration_seconds_bucket[5m]))) > 1
# 2. TRAFFIC (Counter) - "How much are we doing?"
- alert: TrafficAnomaly
expr: |
abs(sum(rate(http_requests_total[5m])) - sum(rate(http_requests_total[5m] offset 1w)))
/ sum(rate(http_requests_total[5m] offset 1w)) > 0.5
# 3. ERRORS (Counter) - "How often do things fail?"
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.01
# 4. SATURATION (Gauge) - "How full is our system?"
- alert: HighSaturation
expr: avg by (instance) (1 - rate(node_cpu_seconds_total{mode="idle"}[5m])) > 0.9
SLO-Based Multi-Burn Rate Alerts
For the more advanced: burn rate alerts that catch both fast and slow burns of your error budget.
# Fast burn: 2% of monthly error budget consumed in 1 hour
- alert: SLOFastBurn
expr: |
(sum(rate(http_requests_total{status=~"5.."}[1h])) / sum(rate(http_requests_total[1h])) > 14.4 * 0.001)
and
(sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 14.4 * 0.001)
labels:
severity: critical
# Slow burn: Steady consumption over days
- alert: SLOSlowBurn
expr: |
(sum(rate(http_requests_total{status=~"5.."}[6h])) / sum(rate(http_requests_total[6h])) > 1 * 0.001)
and
(sum(rate(http_requests_total{status=~"5.."}[3h])) / sum(rate(http_requests_total[3h])) > 1 * 0.001)
labels:
severity: warning
Troubleshooting Quick Reference
When things go wrong at 3 AM, use this table:
General Issues (All Metric Types)
| Symptom | Likely Cause | Fix | Debug Query |
|---|---|---|---|
| No data at all | Target not scraped | Check target status | up{job="my-service"} |
| Gaps in graph | Scrape failures | Check scrape duration | scrape_duration_seconds{job="..."} |
| Too many series | High cardinality | Add label filters | topk(10, count by (__name__)({__name__!=""})) |
Counter Issues
| Symptom | Likely Cause | Fix |
|---|---|---|
| Flat line | No events occurring | Check application logic |
| Sudden drops | Counter reset | Use rate() (it handles resets) |
| Negative rate | Label churn | Check for recreated series |
Gauge Issues
| Symptom | Likely Cause | Fix |
|---|---|---|
| Value unchanged | Stale metric | Check scrape status |
| Noisy graph | High variance | Use avg_over_time()
|
| Wrong scale | Unit mismatch | Check metric units |
Histogram Issues
| Symptom | Likely Cause | Fix |
|---|---|---|
| Wrong percentile | Bad bucket boundaries | Add more buckets |
| Most values in +Inf | Buckets too small | Increase upper bounds |
| NaN result | No samples | Increase time window |
Summary Issues
| Symptom | Likely Cause | Fix |
|---|---|---|
| Wrong global p99 | Averaged quantiles | Switch to Histogram |
The Cardinality Monster
Let me tell you about the monster that has brought down more Prometheus instances than any other: cardinality.
Cardinality is the number of unique time series in your system. And it can explode faster than you think.
How Cardinality Explodes
Every unique combination of labels creates a new time series:
1 metric × 5 methods × 10 status codes × 100 endpoints × 50 instances
= 250,000 time series from ONE metric
Labels That Will Destroy Your Prometheus
Never use these as labels:
| Label Type | Example | Why It's Bad |
|---|---|---|
| User IDs | user_id="12345" |
Millions of values |
| Request IDs | request_id="abc-123" |
One per request |
| Timestamps | timestamp="2024-01-01" |
Infinite growth |
| IP addresses | client_ip="192.168.1.1" |
Thousands of values |
| Session tokens | session="..." |
One per session |
| Error messages | error="Connection refused..." |
Unbounded strings |
Detecting the Monster
# How bad is it? Count all series.
count({__name__!=""})
# Find the offenders
topk(10, count by (__name__) ({__name__!=""}))
# Check per-label cardinality
count by (endpoint) (http_requests_total)
Cardinality Guidelines
| Level | Series Count | Action |
|---|---|---|
| 🟢 Low | Under 1,000 | You're fine |
| 🟡 Moderate | 1K - 10K | Monitor it |
| 🟠 High | 10K - 100K | Investigate |
| 🔴 Critical | Over 100K | Fix immediately |
Best Practices
Do These Things
-
Always use
rate()with counters - Raw values are useless - Set rate window to 2-4x scrape interval - Ensures enough data points
-
Include
lein yourbyclause beforehistogram_quantile() - Use histograms for percentiles - They aggregate correctly
-
Add
forduration to alerts - Prevents flapping - Define bucket boundaries based on SLOs - Know what matters
Avoid These Mistakes
- Averaging summary quantiles - Mathematically wrong
-
Using
irate()for alerting - Too volatile -
Alerting on raw gauge spikes - Use
forduration - High cardinality labels - They'll kill your Prometheus
-
avg_over_time(rate(...))- Just use a larger rate window
References
- Prometheus Official Documentation: Metric Types
- Prometheus Official Documentation: Querying Basics
- Prometheus Official Documentation: Querying Functions
- Prometheus Official Documentation: Alerting Rules
- Google SRE Book: Monitoring Distributed Systems
- Google SRE Workbook: Alerting on SLOs
- Robust Perception: How does a Prometheus Histogram work?
- Robust Perception: How does a Prometheus Summary work?
- Dash0: Understanding the Prometheus Metric Types
- Better Stack: Prometheus Metrics Explained
Conclusion
So here we are. It's 4:15 AM, but you're no longer panicking.
You know that the Counter is your reliable bookkeeper, always tallying but never forgetting. You query her with rate().
You know that the Gauge is your live reporter, giving you the current state. Her raw values make sense.
You know that the Histogram is your distribution detective, revealing the patterns in your latency. She aggregates correctly across all your pods.
And you know to be careful with the Summary, the solo performer who can't collaborate across instances.
Most importantly, you've learned to respect the Cardinality Monster and keep him caged.
The pager may buzz again. But next time, you'll know exactly what you're looking at.
Now go get some sleep. You've earned it.




Top comments (0)