Tosin Akinosho

Posted on Mar 11

Migrating from DAS to DRA in OpenShift: The Pragmatic Guide

#kubernetes #devops #gpu #openshift

Migrating from DAS to DRA in OpenShift: The Pragmatic Guide

If you are running high-density AI/ML workloads on OpenShift 4.20 or later, it's time to have a serious talk about your GPU partitioning strategy.

For a long time, we relied on the Dynamic Accelerator Slicer (DAS)—usually via the InstaSlice operator—to dynamically carve out NVIDIA MIG partitions. It worked, but it was essentially a set of webhooks and custom scheduler plugins hacking the legacy device plugin model.

With the introduction of Dynamic Resource Allocation (DRA) as a native Kubernetes standard, Red Hat is officially deprecating DAS. DRA isn't just an upgrade; it is a total "cut-over" replacement that moves away from treating GPUs as dumb integer counts (nvidia.com/gpu: 1) to treating them as complex objects you can query using CEL.

Here is the pragmatic, step-by-step guide to ripping out DAS and implementing the NVIDIA DRA Driver in OpenShift.

Prerequisites for the DRA Stack

Before you start tearing things down, verify your software stack. DRA is a complex orchestration of the container runtime and hardware:

Kubernetes Version: v1.34.2+ (Foundational support for stable DRA APIs. Note: OpenShift 4.20 aligns with Kubernetes ~1.33, but full DRA support matures in the 4.21/1.34+ timeframe.)
NVIDIA Driver: v580+ (Required for CDI and dynamic reconfiguration)
GPU Operator: v25.10.0+
Runtime: CRI-O with Container Device Interface (CDI) enabled.

Phase 1: Scorched Earth (Decommissioning DAS)

Because DAS and DRA use fundamentally different scheduling logic, they cannot co-exist. You must completely nuke the DAS environment. This isn't just deleting the operator pod; you have to clean the cluster state.

# File: scripts/cleanup-das.sh
# Lines: 1-14
#!/bin/bash

# 1. Stop Existing Workloads (Find and terminate pods using DAS resources)
# Add --dry-run or echo before xargs in production to verify targets first.
# Note: This checks primary containers. Adjust the jq path if your initContainers also request MIG slices.
oc get pods --all-namespaces -o json | \
  jq '.items[] | select(.spec.containers[].resources.requests["mig.das.com"] != null) | .metadata.name' | \
  xargs -I {} oc delete pod {}

# 2. Delete all legacy AllocationClaims (Note: verify the exact CRD name for your InstaSlice version)
oc delete allocationclaims --all -n das-operator

# 3. Verify the blast radius is clear
oc get crd | grep allocationclaim

# 4. (Manual Step) Remove the DAS subscription, OperatorGroup, and the das-operator namespace.

Phase 2: Deploying the DRA Driver

Once the nodes are clean, you need to prepare the workers and deploy the NVIDIA GPU Operator differently than you used to.

First, label your DRA-targeted nodes to prevent the driver manager from evicting critical plugins during reconfiguration:
oc label node <node-name> nvidia.com/dra-kubelet-plugin=true

When installing the GPU Operator, you must disable the legacy device plugin. If you don't, you'll end up with resource advertisement conflicts.

# File: values/gpu-operator-values.yaml
# Lines: 1-10
# GPU Operator Helm Values for DRA
devicePlugin:
  enabled: false # Critical: hands control over to DRA

# When installing the separate DRA Driver chart (conceptual illustration):
nvidiaDriverRoot: /run/nvidia/driver
gpuResourcesEnabledOverride: true # Required for full GPU & MIG allocation support via DRA

Gotcha warning: If you are running A100s, changing the MIG partition layout via DRA currently requires a manual restart of the DRA kubelet plugin pod (e.g., oc delete pod -l app.kubernetes.io/name=nvidia-dra-driver-kubelet-plugin -n gpu-operator). The manager doesn't auto-evict the plugin during reconfiguration yet.

Phase 3: Rewriting Your Pod Manifests

This is the biggest change for developers. You can no longer request resources.limits. Everything moves to resources.claims.

The Old Way (DAS):

# File: legacy-pod.yaml
# Lines: 1-10
apiVersion: v1
kind: Pod
metadata:
  name: gemma-inference-legacy
spec:
  containers:
  - name: llm
    resources:
      requests:
        mig.das.com/1g.5gb: 1

The New Way (DRA):
You now reference a ResourceClaimTemplate. Instead of just a single snippet, let's look at what a full, multi-container pod manifest actually looks like when interacting with DRA:

# File: dra-pod.yaml
# Lines: 1-28
apiVersion: v1
kind: Pod
metadata:
  name: gemma-inference-dra
spec:
  containers:
  - name: llm-worker
    image: my-registry/vllm:latest
    resources:
      claims:
      - name: gpu-primary
  - name: monitoring-sidecar
    image: my-registry/gpu-telemetry:latest
    resources:
      # Sidecars don't need the GPU claim, they run alongside the workload
      requests:
        cpu: "100m"
        memory: "128Mi"
  resourceClaims:
  - name: gpu-primary
    source:
      resourceClaimTemplateName: standard-mig-template

This structural shift ensures the hardware is partitioned, reserved, and healthy before the pod is even scheduled.

Operator Dependencies

When managing this lifecycle, you must ensure the NVIDIA GPU Operator (v25.10.0+) is orchestrating the DRA Driver components correctly. If you've been managing GPU operators through the Red Hat OpenShift OperatorHub via OLM (Operator Lifecycle Manager), be aware that the transition requires explicit Subscription and CSV (ClusterServiceVersion) awareness. You can't just apply the new driver and hope OLM understands the state change.

Debugging DRA Allocations

What happens when your pod is pending and you aren't sure why? In the old DAS days, you checked the AllocationClaim objects. In DRA, the nodes advertise their capacity via the ResourceSlice API.

Here is the exact workflow for figuring out why your GPU hasn't attached:

# 1. Check if the cluster even sees your node's physical GPUs
oc get resourceslices

# 2. Check the specific status of your pod's claim
oc get resourceclaim -n my-ai-namespace
# Look for STATUS: Pending or Allocated

# 3. If Pending, describe the claim to see the K8s scheduler's CEL evaluation
oc describe resourceclaim my-gpu-claim-abc12 -n my-ai-namespace

If the scheduler cannot match your pod's hardware request (e.g., you asked for device.attributes.vram >= 80GB but only have 40GB A100s), the describe output will explicitly tell you the CEL evaluation failed on all available nodes.

The NVLink Bonus: ComputeDomains

While dynamic MIG slicing is the primary DAS replacement, DRA brings a massive upgrade for folks running NVIDIA GB200 or HGX systems: ComputeDomains.

Instead of just dividing a single GPU, ComputeDomains allow you to securely share GPU memory across multiple nodes via Multi-Node NVLink (MNNVL). By specifying a computeDomainName in your pod claims, the DRA driver handles all the heavy lifting to establish connectivity among pods in that domain. This keeps them isolated from other namespaces while operating at full NVLink speeds. For large-scale distributed training, this alone is worth the migration effort.

Summary

Moving from DAS to DRA is a paradigm shift. It requires coordination between platform teams and developers to rewrite manifests, but the payoff is a native, stable, and highly expressive hardware scheduling API. Open up your ResourceSlices API, watch your devices get cleanly allocated, and enjoy the deprecation of hacky mutating webhooks.

DEV Community

Migrating from DAS to DRA in OpenShift: The Pragmatic Guide

Migrating from DAS to DRA in OpenShift: The Pragmatic Guide

Prerequisites for the DRA Stack

Phase 1: Scorched Earth (Decommissioning DAS)

Phase 2: Deploying the DRA Driver

Phase 3: Rewriting Your Pod Manifests

Operator Dependencies

Debugging DRA Allocations

The NVLink Bonus: ComputeDomains

Summary

External References

Top comments (0)