DEV Community

Guptaji Teegela
Guptaji Teegela

Posted on

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

When every Pod screams for CPU and memory, who decides who lives, who waits, and who gets evicted?

Kubernetes isn't just a scheduler — it's a negotiator of fairness and efficiency.
Every second, it balances hundreds of workloads, deciding what runs, what waits, and what gets terminated — while maintaining reliability and cost efficiency.

This article unpacks how Quality of Service (QoS), Priority Classes, Preemption, and Bin-Packing Scoring come together to keep your cluster stable and fair.


⚙️ The Challenge: Competing Workloads in Shared Clusters

When multiple workloads share cluster resources, conflicts are inevitable:

  1. High-traffic apps starve lower workloads.
  2. Batch jobs hog memory.
  3. Pods without limits cause unpredictable evictions.

Kubernetes addresses this by applying a layered decision-making model — QoS, Priority, Preemption, and Scoring.


🧭 QoS (Quality of Service): Who Gets Evicted First

Each Pod belongs to a QoS class based on CPU and memory configuration:

QoS Class Description Eviction Priority
Guaranteed Requests = Limits for all containers Evicted last
Burstable Requests < Limits Evicted after BestEffort
BestEffort No requests/limits set Evicted first

💡 Lesson: Always define requests and limits — QoS decides who survives under node pressure.


🧱 Priority Classes: Who Runs First

QoS defines who stays, while Priority Classes define who starts.
Assigning PriorityClass values (integer-based) helps rank workloads during scheduling.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-services
value: 100000
description: Critical platform workloads
Enter fullscreen mode Exit fullscreen mode

💡 Lesson: Reserve high priorities for mission-critical services.
Overusing "high" priority leads to chaos — not resilience.


⚔️ Preemption: Controlled Sacrifice, Not Chaos

When a high-priority Pod can't be scheduled:

  1. The scheduler identifies lower-priority Pods occupying resources.
  2. Marks them for termination.
  3. Reschedules the high-priority Pod.

This is guided by PodDisruptionBudgets (PDBs) to avoid excessive collateral damage.

💡 Lesson: Preemption is controlled resilience — ensuring important workloads run while maintaining order.


⚖️ Scoring & Bin-Packing: Finding the Right Home

Once eligible nodes are filtered, Kubernetes enters the scoring phase to find the best fit.

Plugins involved:

  • LeastRequestedPriority → favors underutilized nodes.
  • BalancedResourceAllocation → balances CPU & memory use.
  • ImageLocalityPriority → prefers nodes with cached images.
  • NodeAffinityPriority → honors affinity preferences.
  • TopologySpreadConstraint → ensures zone diversity.

Each node receives a score (0–100) from multiple plugins.
Weighted scores are combined:

final_score = (w1*s1) + (w2*s2) + ...
Enter fullscreen mode Exit fullscreen mode

QoS defines survivability.
Priority defines importance.
Scoring defines placement.

Together, they shape a stable and efficient cluster.


🧩 Visual Flow: Kubernetes Scheduling & Bin-Packing


🧠 Key Lessons for SREs & Platform Teams

✅ Always define CPU/memory requests & limits.
✅ Use PriorityClasses sparingly.
✅ Test evictions under simulated stress.
✅ Combine QoS + PDB + Priority for controlled resilience.
✅ Observe scheduling metrics (kube_pod_status_phase, scheduler_score) regularly.


🚀 Takeaway

Kubernetes doesn't just schedule Pods — it negotiates priorities.
Reliability doesn't come from overprovisioning, but from predictable, fair, and disciplined scheduling.

Resilience = Consistency in scheduling decisions.

Top comments (0)