DEV Community

Cover image for KAI Scheduler
Minwook Je
Minwook Je

Posted on • Edited on

KAI Scheduler

KAI Scheduler

nvidia

CNCF Dashboard

Features

  1. Batch Scheduling
    • Bin Packing: min # of nodes used (min fragmentation)
    • Spread Scheduling: max # of nodes used (HA, load balancing)
  2. Workload Priority
  3. Hierarchical Queues: 2 level queue (parent & child)
  4. Fairness
    • Dominant Resource Fairness (DRF) scheduling with quota enforcement and reclaim across queues
  5. Elastic Workloads
    • Dynamically scale workloads within defined minimum and maximum pod counts.
  6. DRA
    • Dynamic Resource Allocation
    • Support multi vendor (Nvidia, AMD ..)
  7. GPU Sharing: share single or multi GPUs, maxmizing resource utilization.
  8. Cloud & On-premise
    • Supportauto-scalers like karpenter)

Concepts and Principles

docs

The Scheduler's primary responsibility is to allocate workloads to the most suitable node or nodes based:

  1. resource requirements
  2. fairness
  3. quota management.

Workloads

Introduction to Workloads

  • 1:N = Pod: Node
    • a single pod running on individual nodes
  • N:1 = Pod:Node
    • distributed multiple pods, each running on a node
  • N:M = Pod: Node

GPU sharing

GPU sharing

Allocating a GPU device to multiple pods by

  1. request GPU memory amount (e.g. 2000Mib)
  2. Or, request a portion of a GPU device mem
# gpu-memory.yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu-sharing
  labels:
    kai.scheduler/queue: test
  annotations:
    gpu-memory: "2000" # in Mib
spec:
  schedulerName: kai-scheduler
  containers:
    - name: ubuntu
      image: ubuntu
      args: ["sleep", "infinity"]
Enter fullscreen mode Exit fullscreen mode
# gpu-sharing.yaml
apiVersion: v1
kind: Pod
metadata:
  name: gpu-sharing
  labels:
    kai.scheduler/queue: test
  annotations:
    gpu-fraction: "0.5"
spec:
  schedulerName: kai-scheduler
  containers:
    - name: ubuntu
      image: ubuntu
      args: ["sleep", "infinity"]
Enter fullscreen mode Exit fullscreen mode

GPU sharing autoscaling

Problem: GPU requests in annotations -> Cluster Autoscaler can't detect them

Solution: node-scale-adjuster

  • Watches unschedulable GPU-sharing pods
  • Launches utility pod requesting full GPU → triggers autoscaler

The utility pod is used because the k8s cluster Autoscaler cannot detect GPU sharing pods.

Why Autoscaler cannot detect GPU sharing-pods?
Because it only looks at resources declared in the pod spec (resources.requests) but GPU-sharing pods place their GPU requests in annotations not in the standard resources field. Since the autoscaler ignores annotations, it sees those pods as having no GPU demand, so it doesn't trigger scaling.

To solve this, the node-scale-adjuster launches a temp utility pod that requests a full GPU in its spec. This makes the autoscaler recognize a resource shortage and triggers it to scale up the node pool.

💡 I think this workaround using utility pods is problematic, since scheduling noise and inaccurate scaling. A better solution is to expose GPU sharing as a proper custom resource and use an operator to manage and report GPU demand declaratively to the autoscaler.

Calculation

  • Sums GPU fractions (e.g., 2 pods × 0.5 GPU -> 1 utility pod)
  • If new node fits only one -> spawns additional utility pod

GPU Memory Requests

  • Assumes 0.1 GPU per pod
  • It means creates 1 utility pod per 10 memory-based pods
  • Configurable via --gpu-memory-to-fraction-ratio
# how to enable?
--set "global.clusterAutoscaling=true"
Enter fullscreen mode Exit fullscreen mode

GPU sharing with MPS

Refs

Top comments (0)