CPU Limits in Kubernetes: Mostly Harmful, Occasionally Essential

#devops #performance #kubernetes #sre

CPU limits in Kubernetes are often treated as a mandatory best practice. Define requests, define limits, move on.
Over time, however, many teams discover that CPU limits, especially for application workloads, introduce more problems than they solve.

In this article, I’ll explain why CPU limits are frequently counterproductive, and then describe a real production incident where the absence of CPU limits on a specific type of workload led to a node failure. The takeaway is not a reversal of the original idea, but a clearer understanding of where it applies and where it absolutely does not.

Why CPU Limits Often Hurt More Than They Help

CPU Limits Introduce Artificial Throttling
In Kubernetes, CPU limits are enforced by the Linux scheduler using cgroups. Once a container reaches its assigned quota, it is forcibly throttled within the current scheduling period, even if the node has idle CPU capacity available.
For workloads with bursty CPU patterns, this behavior is harmful. Many services occasionally need short-lived CPU spikes to complete work efficiently: handling request bursts, warming caches, or performing runtime maintenance tasks. When limits are set, those spikes turn into throttling events, increasing request latency and amplifying tail delays.
In practice, this often means worse performance on an otherwise healthy and underutilized node.

Removing CPU Limits Can Improve Real-World Stability
In several production environments, removing CPU limits from application workloads has led to measurable improvements:

Lower latency under load
Faster recovery from traffic spikes
Reduced throttling without additional infrastructure

Autoscaling mechanisms such as HPA work best when containers can fully utilize available CPU. Artificial caps interfere with this feedback loop and delay scale-out exactly when it’s most needed.
For many application services, CPU limits end up solving a problem that doesn’t exist, while creating one that does.

When This Approach Becomes Dangerous

The guidance above assumes one critical condition:

The workload must fully respect Kubernetes resource isolation.

Not all workloads do.
We encountered this firsthand in a Kubernetes management cluster running build agents.

Incident: Node CPU Saturation and NotReady State
A worker node suddenly reached near-constant 100% CPU utilization and remained there for several minutes. Shortly after:

The kubelet stopped reporting heartbeats
The node transitioned to NotReady

At the same time:

Pod-level CPU metrics looked normal
No throttling was visible at the pod level
Nothing appeared obviously misconfigured

The build agent pod running on the node did not have CPU limits configured by design, following the “no CPU limits” philosophy.
So why did a single pod manage to destabilize the entire node?

Root Cause: Privileged Build Workloads Bypass Assumptions

The build agent was running as a privileged pod and started its own container runtime internally to execute jobs.
This distinction matters.

What Actually Happened

The pod itself was scheduled normally and respected its CPU request
Inside the pod, a container runtime launched additional processes
Those processes were not constrained by pod-level CPU isolation
Under heavy workload, they consumed all available node CPU
The kubelet was starved of CPU time
Node health checks failed, and the node became NotReady

This was not a bug in Kubernetes. It was a mismatch between assumed isolation and actual workload behavior.

Revisiting the Question: Should CPU Limits Be Used?
The correct answer is neither “always” nor “never”.
CPU Limits Are Often Unnecessary For:

Stateless application services
Non-privileged containers
Workloads without nested runtimes
Large nodes with sufficient headroom
Services managed by HPA

CPU Limits or Strong Isolation Are Required For:

Build agents and CI runners
Privileged pods
Workloads executing untrusted or user-defined code
Nested container runtimes
Small or mixed-purpose nodes

In these cases, assuming that “CPU limits are harmful” without additional isolation is a mistake.

Practical Recommendations

For build and CI workloads:

Use dedicated node groups
Apply taints and tolerations
Avoid colocating them with application workloads
Enforce resource boundaries at the node level
Prefer architectures that avoid nested runtimes

Final Thoughts

Removing CPU limits can significantly improve performance but only when workloads behave as expected and respect Kubernetes isolation boundaries.
Privileged workloads and build systems operate under different rules. Applying application-level best practices to them without adjustment can destabilize the entire cluster.

The real lesson is simple: