霓漠Nimbus

Posted on Sep 11

A Quick Take on K8s 1.34 GA DRA: 7 Questions You Probably Have

#ai #devops #kubernetes #opensource

The 7 questions

What problem does DRA solve?
Does “dynamic” mean hot-plugging a GPU to a running Pod or in-place GPU memory resize?
What real-world use cases (and “fun” possibilities) does DRA enable?
How does DRA relate to the DevicePlugin? Can they coexist?
What’s the status of GPU virtualization under DRA? What about HAMi?
Which alpha/beta features around DRA are worth watching?
When will this be production-ready at scale?

Before we dive in, here’s a mental model that helps a lot:

Know HAMi + know PV/PVC ≈ know DRA.

More precisely: DRA borrows the dynamic provisioning idea from PV/PVC and adds a structured, standardized abstraction for device requests. The core insight is simple:

Previously, the DevicePlugin didn’t surface enough structured information for the scheduler to make good decisions. DRA fixes that by richly describing devices and requests in a way the scheduler (and autoscaler) can reason about.

In plain English: report more facts, and make the scheduler aware of them. That’s DRA’s “structured parameters” in a nutshell.

If you’re familiar with HAMi’s Node & Pod annotation–based mechanism for conveying device constraints to the scheduler, DRA elevates the same idea into first-class, structured API objects that the native scheduler and Cluster Autoscaler can reason about directly.

A bit of history (why structured parameters won)

The earliest DRA design wasn’t structured. Vendors proposed opaque, driver-owned CRDs. The scheduler couldn’t see global availability or interpret those fields, so it had to orchestrate a multi-round “dance” with the vendor controller:

Scheduler writes a candidate node list into a temp object
Driver controller removes unfit nodes
Scheduler picks a node
Driver tries to allocate
Allocation status is written back
Only then does the scheduler try to bind the Pod

Every step risked races, stale state, retries—hot spots on the API server, pressure on drivers, and long-tail scheduling latency. Cluster Autoscaler (CA) also had poor predictive power because the scheduler itself didn’t understand the resource constraints.

That approach was dropped in favor of structured parameters, so scheduler and CA can reason directly and participate in the decision upfront.

Now the Q&A

1) What problem does DRA actually solve?

It solves this: “DevicePlugin’s reported info isn’t enough, and if you report it elsewhere the scheduler can’t see it.”

DRA introduces structured, declarative descriptions of device needs and inventory so the native scheduler can decide intelligently.

2) Does “dynamic” mean hot-plugging GPUs into a running Pod, or in-place VRAM up/down?

Neither. Here, dynamic primarily means flexible, declarative device selection at scheduling time, plus the ability for drivers to prepare/cleanup around bind and unbind. Think of it as flexible resource allocation, not live GPU hot-plugging or in-place VRAM resizing.

3) What new toys does DRA bring? Where does it shine?

DRA adds four key concepts:

DeviceClass → think StorageClass
ResourceClaim → think PVC
ResourceClaimTemplate → think VolumeClaimTemplate (flavor or “SKU” you’d expose on a platform)
ResourceSlice → a richer, extensible inventory record, i.e., a supercharged version of what DevicePlugin used to advertise

This makes inventory and SKU management feel native. A lot of the real “fun” lands with features that are α/β today (see below), but even at GA the information model is the big unlock.

4) What’s the relationship with DevicePlugin? Can they coexist?

DRA is meant to replace the legacy DevicePlugin path over time. To make migration smoother, there’s KEP-5004 (DRA Extended Resource Mapping) which lets a DRA driver map devices to extended resources (e.g., nvidia.com/gpu) during a transition.

Practically:

You can run both in the same cluster during migration.
A single node cannot expose the same named extended resource from both.
You can migrate apps and nodes gradually.

5) What about GPU virtualization? And HAMi?

Template-style (MIG-like) partitioning: see KEP-4815 – DRA Partitionable Devices.
Flexible (capacity-style) sharing like HAMi: the community is building on KEP-5075 – DRA Consumable Capacity (think “share by capacity” such as VRAM or bandwidth).

HAMi’s DRA driver (demo branch) lives here:

https://github.com/Project-HAMi/k8s-dra-driver/tree/demo

6) What α/β features look exciting?

Already mentioned, but here’s the short list:

KEP-5004 – DRA Extended Resource Mapping: smoother migration from DevicePlugin
KEP-4815 – Partitionable Devices: MIG-like templated splits
KEP-5075 – Consumable Capacity: share by capacity (VRAM, bandwidth, etc.)

And more I’m watching:

KEP-4816 – Prioritized Alternatives in Device Requests

Let a request specify ordered fallbacks—prefer “A”, accept “B”, or even prioritize allocating “lower-end” first to keep “higher-end” free.
KEP-4680 – Resource Health in Pod Status

Device health surfaces directly in PodStatus for faster detection and response.
KEP-5055 – Device Taints/Tolerations

Taint devices (by driver or humans) e.g., “nearing decommission” or “needs maintenance”, and control placement with tolerations.

7) When will this be broadly production-ready?

For wide, low-friction production use, you typically want β maturity + ecosystem drivers to catch up. A rough expectation: ~ 8–16 months for most shops, depending on vendors and your risk posture.

DEV Community