Guptaji Teegela

Posted on Nov 20

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

#devops #sre #microservices #platformengineering

When every Pod screams for CPU and memory, who decides who lives, who waits, and who gets evicted?

Kubernetes isn't just a scheduler — it's a negotiator of fairness and efficiency.
Every second, it balances hundreds of workloads, deciding what runs, what waits, and what gets terminated — while maintaining reliability and cost efficiency.

This article unpacks how Quality of Service (QoS), Priority Classes, Preemption, and Bin-Packing Scoring come together to keep your cluster stable and fair.

⚙️ The Challenge: Competing Workloads in Shared Clusters

When multiple workloads share cluster resources, conflicts are inevitable:

High-traffic apps starve lower workloads.
Batch jobs hog memory.
Pods without limits cause unpredictable evictions.

Kubernetes addresses this by applying a layered decision-making model — QoS, Priority, Preemption, and Scoring.

🧭 QoS (Quality of Service): Who Gets Evicted First

Each Pod belongs to a QoS class based on CPU and memory configuration:

QoS Class	Description	Eviction Priority
Guaranteed	Requests = Limits for all containers	Evicted last
Burstable	Requests < Limits	Evicted after BestEffort
BestEffort	No requests/limits set	Evicted first

💡 Lesson: Always define requests and limits — QoS decides who survives under node pressure.

🧱 Priority Classes: Who Runs First

QoS defines who stays, while Priority Classes define who starts.
Assigning PriorityClass values (integer-based) helps rank workloads during scheduling.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-services
value: 100000
description: Critical platform workloads

💡 Lesson: Reserve high priorities for mission-critical services.
Overusing "high" priority leads to chaos — not resilience.

⚔️ Preemption: Controlled Sacrifice, Not Chaos

When a high-priority Pod can't be scheduled:

The scheduler identifies lower-priority Pods occupying resources.
Marks them for termination.
Reschedules the high-priority Pod.

This is guided by PodDisruptionBudgets (PDBs) to avoid excessive collateral damage.

💡 Lesson: Preemption is controlled resilience — ensuring important workloads run while maintaining order.

⚖️ Scoring & Bin-Packing: Finding the Right Home

Once eligible nodes are filtered, Kubernetes enters the scoring phase to find the best fit.

Plugins involved:

LeastRequestedPriority → favors underutilized nodes.
BalancedResourceAllocation → balances CPU & memory use.
ImageLocalityPriority → prefers nodes with cached images.
NodeAffinityPriority → honors affinity preferences.
TopologySpreadConstraint → ensures zone diversity.

Each node receives a score (0–100) from multiple plugins.
Weighted scores are combined:

final_score = (w1*s1) + (w2*s2) + ...

QoS defines survivability.
Priority defines importance.
Scoring defines placement.

Together, they shape a stable and efficient cluster.

🧩 Visual Flow: Kubernetes Scheduling & Bin-Packing

🧠 Key Lessons for SREs & Platform Teams

✅ Always define CPU/memory requests & limits.
✅ Use PriorityClasses sparingly.
✅ Test evictions under simulated stress.
✅ Combine QoS + PDB + Priority for controlled resilience.
✅ Observe scheduling metrics (kube_pod_status_phase, scheduler_score) regularly.

🚀 Takeaway

Kubernetes doesn't just schedule Pods — it negotiates priorities.
Reliability doesn't come from overprovisioning, but from predictable, fair, and disciplined scheduling.

Resilience = Consistency in scheduling decisions.

DEV Community

Beyond Scheduling: How Kubernetes Uses QoS, Priority, and Scoring to Keep Your Cluster Balanced

Top comments (0)