Kubernetes Requests vs Limits: The Scheduler Guarantees One Thing. The Kernel Enforces Another.

#devops #infrastructure #kubernetes #cloud

You set requests. You set limits. The pod still gets throttled — or killed.

Not because Kubernetes is broken. Because requests and limits operate at two completely different layers of the stack — and most teams treat them as a single resource configuration.

Here's what's actually happening:

The Scheduler Uses Requests Only. It Ignores Limits Entirely.

When a pod is created, the scheduler evaluates node capacity against resource requests and makes a placement decision. After that — it's done. It doesn't monitor the pod. It doesn't know what limits are set. It guaranteed placement, not performance.

The Kubelet + Kernel Enforce Limits Only. At Runtime. Under Pressure.

The kubelet continuously monitors container usage against configured limits and enforces them via cgroups. It doesn't know what the scheduler decided. It watches usage and reacts when thresholds are crossed.

These two systems share no state. A pod can be perfectly placed and still get throttled or killed at runtime — because the limit configuration doesn't match the workload's actual behavior.

The CPU vs Memory Distinction Matters More Than Most Docs Make Clear

CPU is compressible — hit the limit and the kernel throttles via cgroups. The container keeps running. Just slower. No log entry. No event. No OOMKilled status. It just gets slower.

Memory is non-compressible — hit the limit and the kernel's OOM killer terminates the process. No degradation warning. No grace period. Status: OOMKilled.

CPU fails slowly. Memory fails instantly.

QoS Class Is a Failure Sequencing System, Not Just a Label

Guaranteed (requests == limits) — last to be evicted under pressure
Burstable (requests < limits) — evicted before Guaranteed
BestEffort (no requests or limits) — first to die under pressure

Skipping requests doesn't simplify configuration. It places your pods at maximum eviction risk and removes the scheduler's ability to make informed placement decisions.

The Four Failure Patterns That Follow From Getting This Wrong

[01] OOMKilled — memory limit too low for peak behavior

[02] CPU Throttling — limit too low, producing silent latency degradation

[03] Node Pressure Eviction — requests set too high, scheduler overcommits the node

[04] Scheduler Fragmentation — no requests set, placement becomes unpredictable

Most Kubernetes resource failures aren't bugs. They're configuration decisions made without a clear model of how the two layers actually work.

Full breakdown with diagrams, QoS decision framework, and practical sizing guidance on rack2cloud.com — https://www.rack2cloud.com/kubernetes-resource-requests-vs-limits/