DEV Community

Satyaki
Satyaki

Posted on

CPU Humbled Me — A Kubernetes Throttling Story Hidden Between Prometheus Scrapes

Memory is easy. CPU humbled me.

With memory, the rule is brutal but clear — cross the limit, the pod gets OOMKilled. Done.

CPU? CPU is sneaky. And I ignored it for the longest time… until it broke production.

Here's what happened 👇

We had an app running peacefully in-house. Then it went client-facing. Traffic surged, and suddenly ~15% of requests started timing out — most of them on DB calls.

I opened Grafana expecting a smoking gun. Nothing. CPU usage looked "fine." No throttling alerts screaming at me. Just confused timeouts.

The trap? Throttling happens in milliseconds. Prometheus scrapes every 15 seconds. Every bit of evidence was hiding between the scrapes.

Here was the setup:

resources:
  requests:
    cpu: 200m
    memory: 512Mi
  limits:
    cpu: 800m
    memory: 1.5Gi
Enter fullscreen mode Exit fullscreen mode

Numbers from the incident (rough, but directionally honest):

  • Normal: 300 req/min → avg CPU ~180m
  • Surge: 1200 req/min → avg CPU ~650m, ~15% timeouts

So I sat down and actually did the math instead of guessing.

How CPU actually works

CPU is compressible. Memory isn't. When CPU runs out, your process doesn't die — it gets throttled. The Linux CFS scheduler slices time into periods (default: 100ms). Within each period, your container gets a quota based on its limit. Cross the quota mid-period? You wait for the next one. That wait is the latency you're seeing.

Walking through the numbers

Normal load:

300 req/min = 5 req/sec = 0.5 requests per 100ms
Avg CPU 180m = 18ms of CPU work per 100ms period
→ 18ms ÷ 0.5 req = ~36ms of CPU work per request
Enter fullscreen mode Exit fullscreen mode

Surge load:

1200 req/min = 20 req/sec = 2 requests per 100ms
2 × 36ms = 72ms of CPU work needed per 100ms
Enter fullscreen mode Exit fullscreen mode

But the limit was 800m → 80ms quota per 100ms. Looks fine on paper, right?

Here's the catch: avg CPU was 650m (65ms). The average hides the bursts. Some periods sat well below quota; others blew past the 80ms ceiling and got throttled. Average everything out across 15s scrapes and the dashboard whispers "all good" while users get timeouts.

That's the lesson. Average CPU is a liar in bursty workloads. Throttling lives in the gaps your monitoring can't see.

What to actually look at

Stop staring at container_cpu_usage_seconds_total. Look at:

  • container_cpu_cfs_throttled_periods_total
  • container_cpu_cfs_throttled_seconds_total

The ratio of throttled periods to total periods tells you the truth.

Remediations (in order of maturity, not just "increase the limit")

  1. Right-size first. Requests and limits should reflect real workload behavior, not guesses copy-pasted from a template.
  2. Load test before going client-facing. Running an app in-house ≠ serving real traffic.
  3. VPA recommendations to understand what the app actually wants.
  4. HPA so bursts get distributed across replicas instead of crushing one pod.
  5. Then, if needed, raise the limit — with intent, not panic.

Bumping the limit is the easiest fix and the most expensive habit. Every patch carries a hidden cost — node capacity, bin-packing, cluster bills, blast radius. Understand the why before you reach for the YAML.


This one incident taught me more about Kubernetes resource management than months of reading docs. If you're running anything client-facing, please don't wait for a production incident to learn this.

CPU isn't just a number on a dashboard. It's a time budget — and your users feel every millisecond you overspend.

Have you been burned by CFS throttling? What metric finally gave it away for you? Drop it in the comments — I'd love to compare notes.

Top comments (0)