CPU limits in Kubernetes are often treated as a mandatory best practice. Define requests, define limits, move on.
Over time, however, many teams discover that CPU limits—especially for application workloads—introduce more problems than they solve.
In this article, I’ll explain why CPU limits are frequently counterproductive, and then describe a real production incident where the absence of CPU limits on a specific type of workload led to a node failure. The takeaway is not a reversal of the original idea, but a clearer understanding of where it applies—and where it absolutely does not.
Why CPU Limits Often Hurt More Than They Help
CPU Limits Introduce Artificial Throttling
In Kubernetes, CPU limits are enforced by the Linux scheduler using cgroups. Once a container reaches its assigned quota, it is forcibly throttled within the current scheduling period—even if the node has idle CPU capacity available.
For workloads with bursty CPU patterns, this behavior is harmful. Many services occasionally need short-lived CPU spikes to complete work efficiently: handling request bursts, warming caches, or performing runtime maintenance tasks. When limits are set, those spikes turn into throttling events, increasing request latency and amplifying tail delays.
In practice, this often means worse performance on an otherwise healthy and underutilized node.
Removing CPU Limits Can Improve Real-World Stability
In several production environments, removing CPU limits from application workloads has led to measurable improvements:
- Lower latency under load
- Faster recovery from traffic spikes
- Reduced throttling without additional infrastructure
Autoscaling mechanisms such as HPA work best when containers can fully utilize available CPU. Artificial caps interfere with this feedback loop and delay scale-out exactly when it’s most needed.
For many application services, CPU limits end up solving a problem that doesn’t exist, while creating one that does.
When This Approach Becomes Dangerous
The guidance above assumes one critical condition:
The workload must fully respect Kubernetes resource isolation.
Not all workloads do.
We encountered this firsthand in a Kubernetes management cluster running build agents.
Incident: Node CPU Saturation and NotReady State
A worker node suddenly reached near-constant 100% CPU utilization and remained there for several minutes. Shortly after:
- The kubelet stopped reporting heartbeats
- The node transitioned to NotReady
At the same time:
- Pod-level CPU metrics looked normal
- No throttling was visible at the pod level
- Nothing appeared obviously misconfigured
The build agent pod running on the node did not have CPU limits configured by design, following the “no CPU limits” philosophy.
So why did a single pod manage to destabilize the entire node?
Root Cause: Privileged Build Workloads Bypass Assumptions
The build agent was running as a privileged pod and started its own container runtime internally to execute jobs.
This distinction matters.
What Actually Happened
- The pod itself was scheduled normally and respected its CPU request
- Inside the pod, a container runtime launched additional processes
- Those processes were not constrained by pod-level CPU isolation
- Under heavy workload, they consumed all available node CPU
- The kubelet was starved of CPU time
- Node health checks failed, and the node became
NotReady
This was not a bug in Kubernetes. It was a mismatch between assumed isolation and actual workload behavior.
Revisiting the Question: Should CPU Limits Be Used?
The correct answer is neither “always” nor “never”.
CPU Limits Are Often Unnecessary For:
- Stateless application services
- Non-privileged containers
- Workloads without nested runtimes
- Large nodes with sufficient headroom
- Services managed by HPA
CPU Limits or Strong Isolation Are Required For:
- Build agents and CI runners
- Privileged pods
- Workloads executing untrusted or user-defined code
- Nested container runtimes
- Small or mixed-purpose nodes
In these cases, assuming that “CPU limits are harmful” without additional isolation is a mistake.
Practical Recommendations
For build and CI workloads:
- Use dedicated node groups
- Apply taints and tolerations
- Avoid colocating them with application workloads
- Enforce resource boundaries at the node level
- Prefer architectures that avoid nested runtimes
Final Thoughts
Removing CPU limits can significantly improve performance—but only when workloads behave as expected and respect Kubernetes isolation boundaries.
Privileged workloads and build systems operate under different rules. Applying application-level best practices to them without adjustment can destabilize the entire cluster.
The real lesson is simple:
Optimize trusted workloads. Isolate the rest.

Top comments (0)