When every Pod screams for CPU and memory, who decides who lives, who waits, and who gets evicted?
Kubernetes isn't just a scheduler — it's a negotiator of fairness and efficiency.
Every second, it balances hundreds of workloads, deciding what runs, what waits, and what gets terminated — while maintaining reliability and cost efficiency.
This article unpacks how Quality of Service (QoS), Priority Classes, Preemption, and Bin-Packing Scoring come together to keep your cluster stable and fair.
⚙️ The Challenge: Competing Workloads in Shared Clusters
When multiple workloads share cluster resources, conflicts are inevitable:
- High-traffic apps starve lower workloads.
- Batch jobs hog memory.
- Pods without limits cause unpredictable evictions.
Kubernetes addresses this by applying a layered decision-making model — QoS, Priority, Preemption, and Scoring.
🧭 QoS (Quality of Service): Who Gets Evicted First
Each Pod belongs to a QoS class based on CPU and memory configuration:
| QoS Class | Description | Eviction Priority |
|---|---|---|
| Guaranteed | Requests = Limits for all containers | Evicted last |
| Burstable | Requests < Limits | Evicted after BestEffort |
| BestEffort | No requests/limits set | Evicted first |
💡 Lesson: Always define requests and limits — QoS decides who survives under node pressure.
🧱 Priority Classes: Who Runs First
QoS defines who stays, while Priority Classes define who starts.
Assigning PriorityClass values (integer-based) helps rank workloads during scheduling.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: critical-services
value: 100000
description: Critical platform workloads
💡 Lesson: Reserve high priorities for mission-critical services.
Overusing "high" priority leads to chaos — not resilience.
⚔️ Preemption: Controlled Sacrifice, Not Chaos
When a high-priority Pod can't be scheduled:
- The scheduler identifies lower-priority Pods occupying resources.
- Marks them for termination.
- Reschedules the high-priority Pod.
This is guided by PodDisruptionBudgets (PDBs) to avoid excessive collateral damage.
💡 Lesson: Preemption is controlled resilience — ensuring important workloads run while maintaining order.
⚖️ Scoring & Bin-Packing: Finding the Right Home
Once eligible nodes are filtered, Kubernetes enters the scoring phase to find the best fit.
Plugins involved:
- LeastRequestedPriority → favors underutilized nodes.
- BalancedResourceAllocation → balances CPU & memory use.
- ImageLocalityPriority → prefers nodes with cached images.
- NodeAffinityPriority → honors affinity preferences.
- TopologySpreadConstraint → ensures zone diversity.
Each node receives a score (0–100) from multiple plugins.
Weighted scores are combined:
final_score = (w1*s1) + (w2*s2) + ...
QoS defines survivability.
Priority defines importance.
Scoring defines placement.
Together, they shape a stable and efficient cluster.
🧩 Visual Flow: Kubernetes Scheduling & Bin-Packing
🧠 Key Lessons for SREs & Platform Teams
✅ Always define CPU/memory requests & limits.
✅ Use PriorityClasses sparingly.
✅ Test evictions under simulated stress.
✅ Combine QoS + PDB + Priority for controlled resilience.
✅ Observe scheduling metrics (kube_pod_status_phase, scheduler_score) regularly.
🚀 Takeaway
Kubernetes doesn't just schedule Pods — it negotiates priorities.
Reliability doesn't come from overprovisioning, but from predictable, fair, and disciplined scheduling.
Resilience = Consistency in scheduling decisions.
Top comments (1)
The point that really landed for me was how QoS, PriorityClasses, and PDBs together turn preemption from “random chaos” into controlled resilience. Framing Kubernetes as a negotiator that encodes these trade-offs explicitly makes it much clearer why just cranking everything to high priority actually harms reliability instead of protecting critical workloads.