daniel jeong

Posted on Apr 15 • Originally published at manoit.co.kr

Complete Guide to Kubernetes 1.36: DRA GA, OCI VolumeSource, MutatingAdmissionPolicy, and Production Upgrade Checklist

#kubernetes #docker #cloud #devops

Complete Guide to Kubernetes 1.36 — DRA GA, OCI VolumeSource, and MutatingAdmissionPolicy Usher in a New Era for AI Workloads and Security

Kubernetes v1.36 ships on April 22, 2026. This release brings Dynamic Resource Allocation (DRA) to GA, OCI VolumeSource to Stable, and MutatingAdmissionPolicy to GA — major stabilizations that directly impact production environments. DRA's graduation in particular transforms how GPU and FPGA resources are natively scheduled, fundamentally changing the paradigm for AI/ML workload operations. At the same time, bold security cleanups like the permanent removal of gitRepo volumes and Portworx in-tree driver deletion tighten the cluster attack surface. This guide covers every key change in 1.36 with production-ready code examples and an upgrade checklist.

1. Release Overview and Change Matrix

v1.36 includes over 60 enhancements, with more than 10 graduating to GA. Workload-aware scheduling, introduced in 1.35 Timbernetes, has matured further, and meaningful progress spans security, storage, and networking.

Category	Feature	Status	Key Change
AI/GPU	Dynamic Resource Allocation (DRA)	GA	Native GPU/FPGA scheduling, 50% perf improvement
Storage	OCI VolumeSource	GA	Mount OCI registry artifacts directly as volumes
Security/Policy	MutatingAdmissionPolicy (MAP)	GA	Declarative CEL-based mutation, no webhook server
Security	Fine-grained kubelet API Authorization	GA	Per-endpoint RBAC on kubelet access
Storage	Volume Group Snapshot	GA	Atomic multi-volume snapshots via CSI drivers
Storage	SELinux Mount Relabeling	GA	Mount-time labeling replaces recursive relabeling
DX	kubectl kuberc	Beta	Separate user preferences and aliases into a dedicated file
Removal	gitRepo Volume	Removed	Permanent disable — root code execution vulnerability
Removal	Portworx In-tree Driver	Removed	Must migrate to CSI driver
Deprecation	Service .spec.externalIPs	Deprecated	CVE-2020-8554 MITM risk, full removal in v1.43

2. Dynamic Resource Allocation (DRA) GA — A Game Changer for AI Workload Scheduling

DRA was first introduced as alpha in Kubernetes 1.26, underwent a complete redesign in 1.31, and has now graduated to GA in 1.36. It goes beyond the limitations of the existing Device Plugin interface, enabling declarative, attribute-based scheduling of hardware resources like GPUs, FPGAs, and network adapters through native Kubernetes APIs.

2.1 Core DRA Architecture

DRA uses three key API resources: ResourceClaim declares the resources a Pod needs, ResourceSlice advertises resources available on a node, and DeviceClass defines the type and attributes of resources. In 1.36, the scheduler plugin now splits ResourceSlice entries into shared and per-node categories, reducing Filter stage latency by approximately 50%.

# ResourceClaim — Request 2 GPUs
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: gpu-claim
  namespace: ml-training
spec:
  devices:
    requests:
    - name: gpu
      deviceClassName: nvidia-gpu
      count: 2
      selectors:
      - cel:
          expression: "device.attributes['gpu.nvidia.com'].memory.isGreaterThan(quantity('40Gi'))"

# DeviceClass — NVIDIA GPU class definition
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: nvidia-gpu
spec:
  selectors:
  - cel:
      expression: "device.driver == 'gpu.nvidia.com'"
  config:
  - opaque:
      driver: gpu.nvidia.com
      parameters:
        apiVersion: gpu.nvidia.com/v1
        kind: GpuConfig
        sharing:
          strategy: TimeSlicing
          timeSlicingConfig:
            replicas: 4

2.2 Using DRA in Pods

# AI inference Pod using DRA
apiVersion: v1
kind: Pod
metadata:
  name: llm-inference
  namespace: ml-training
spec:
  containers:
  - name: inference
    image: vllm/vllm-openai:latest
    command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]
    args: ["--model", "meta-llama/Llama-3.3-70B-Instruct"]
    resources:
      claims:
      - name: gpu
        request: gpu
  resourceClaims:
  - name: gpu
    resourceClaimName: gpu-claim

2.3 New DRA Features in 1.36

Beyond DRA GA, several related improvements ship in 1.36. Device Taints and Tolerations (Beta) lets you taint specific devices to exclude GPUs under maintenance from scheduling. ResourcePoolStatusRequest API (Alpha) is a new API that queries available devices per pool before submitting workloads. DRA Admin Access also graduates to GA, enabling cluster administrators to manage ResourceClaims centrally.

3. OCI VolumeSource GA — Beyond Container Images to OCI Artifacts as Volumes

OCI VolumeSource was introduced as alpha in 1.31 and has graduated to GA in 1.36. This feature mounts artifacts stored in OCI registries (config files, ML model weights, static data) directly as Pod volumes. Previously, such data had to be baked into container images or downloaded via initContainers — now it can be declaratively mounted.

# Mounting OCI artifacts as volumes
apiVersion: v1
kind: Pod
metadata:
  name: ml-model-server
spec:
  containers:
  - name: server
    image: myregistry.io/inference-server:v2.1
    volumeMounts:
    - name: model-weights
      mountPath: /models/llama-3.3
      readOnly: true
    - name: config
      mountPath: /etc/app-config
      readOnly: true
  volumes:
  - name: model-weights
    image:
      reference: myregistry.io/ml-models/llama-3.3-70b:latest
      pullPolicy: IfNotPresent
  - name: config
    image:
      reference: myregistry.io/configs/inference-config:v1.2
      pullPolicy: Always

This is especially powerful for AI/ML workloads. Managing model weights as OCI artifacts means model updates no longer require rebuilding the entire container image — just swap the model artifact. The existing OCI registry infrastructure (caching, mirroring, access control) can be leveraged as-is.

4. MutatingAdmissionPolicy GA — Declarative Resource Mutation Without Webhooks

MutatingAdmissionPolicy (MAP) is the mutation counterpart to ValidatingAdmissionPolicy. It modifies resources using CEL (Common Expression Language) expressions without external webhook servers. Existing MutatingWebhookConfiguration required maintaining separate servers and could impact cluster operations during network failures. MAP runs inside kube-apiserver, eliminating these issues entirely.

# MutatingAdmissionPolicy — Inject default resource limits on all Pods
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingAdmissionPolicy
metadata:
  name: inject-default-resources
spec:
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
    - apiGroups: [""]
      apiVersions: ["v1"]
      resources: ["pods"]
      operations: ["CREATE"]
  mutations:
  - patchType: ApplyConfiguration
    applyConfiguration:
      expression: |
        Object{
          spec: Object.spec{
            containers: object.spec.containers.map(c,
              Object.spec.containers{
                resources: Object.spec.containers.resources{
                  limits: c.resources.?limits.orValue({}).merge({
                    "memory": c.resources.?limits.orValue({}).?memory.orValue("512Mi"),
                    "cpu": c.resources.?limits.orValue({}).?cpu.orValue("500m")
                  })
                }
              }
            )
          }
        }

MAP supports two mutation modes: Server-Side Apply merge strategy and JSON Patch. With Server-Side Apply, mutations merge naturally with existing fields without overwriting user-specified values. Simple mutation logic previously handled by external solutions like OPA Gatekeeper or Kyverno can now be moved to native Kubernetes.

5. Security Hardening — gitRepo Removal, kubelet Authorization, SELinux Mount

5.1 gitRepo Volume Permanent Removal (KEP-5040)

The gitRepo volume, deprecated since v1.11, has been permanently disabled in 1.36. This volume type cloned Git repositories into Pods but carried a critical security vulnerability that allowed arbitrary code execution as root on nodes. It can no longer be re-enabled via Feature Gate — any workloads using it must be migrated before upgrading.

# gitRepo replacement — initContainer + git-sync pattern
apiVersion: v1
kind: Pod
metadata:
  name: app-with-git
spec:
  initContainers:
  - name: git-clone
    image: alpine/git:latest
    command: ['git', 'clone', '--depth=1', 'https://github.com/org/repo.git', '/repo']
    volumeMounts:
    - name: git-data
      mountPath: /repo
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: git-data
      mountPath: /app/data
      readOnly: true
  volumes:
  - name: git-data
    emptyDir: {}

5.2 Fine-grained kubelet API Authorization (GA)

Fine-grained kubelet API authorization has graduated to GA. Previously, a compromised kubelet credential granted full access to the kubelet API on that node. Starting in 1.36, endpoint-level RBAC can be applied to prevent a compromised node credential from escalating to full kubelet access.

5.3 SELinux Mount Relabeling (GA)

Volume security labeling on SELinux-enforcing systems has been dramatically improved. Instead of recursively traversing every file on a volume to change labels, 1.36 applies them all at once at mount time via mount -o context=XYZ. Pod startup times that previously took minutes on large volumes are now reduced to milliseconds.

# SELinux labeling policy configuration
apiVersion: v1
kind: Pod
metadata:
  name: selinux-optimized
spec:
  securityContext:
    seLinuxOptions:
      level: "s0:c123,c456"
    seLinuxChangePolicy: MountBased
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: my-pvc

6. Developer Experience and Notable Features

6.1 kubectl kuberc (Beta)

kuberc separates kubectl user preferences and aliases into a dedicated file. Custom aliases, default output formats, and column settings can now be managed independently in ~/.kube/kuberc.

# ~/.kube/kuberc — kubectl user preferences
apiVersion: kubectl.config.k8s.io/v1beta1
kind: Preference
aliases:
- name: get-pods-wide
  command: get
  args: ["pods", "-o", "wide"]
  flags:
    all-namespaces: "true"
- name: logs-tail
  command: logs
  flags:
    tail: "100"
    follow: "true"

6.2 ARCH Column Added

kubectl get node -o wide now includes an ARCH column, making it easy to identify node architectures in mixed ARM64/AMD64 clusters at a glance.

# kubectl get node -o wide output in 1.36
kubectl get nodes -o wide

NAME          STATUS   ROLES    AGE   VERSION   INTERNAL-IP    OS-IMAGE       KERNEL         ARCH
node-arm-01   Ready    worker   45d   v1.36.0   10.0.1.10     Bottlerocket    6.1.94         arm64
node-amd-01   Ready    worker   45d   v1.36.0   10.0.1.20     Bottlerocket    6.1.94         amd64
node-gpu-01   Ready    gpu      10d   v1.36.0   10.0.1.30     Ubuntu 22.04    6.5.0-44       amd64

6.3 Service .spec.externalIPs Deprecation (KEP-5707)

Service.spec.externalIPs has been officially deprecated in v1.36. This field posed a security risk due to CVE-2020-8554, which enabled man-in-the-middle attacks on cluster traffic. Full removal is planned for v1.43 — if you're currently using it, plan migration to LoadBalancer Service, NodePort, or Gateway API.

7. v1.36 Upgrade Checklist — Production Migration Guide

Before upgrading production clusters to 1.36, verify the following items:

#	Check Item	Command / Method	Risk
1	gitRepo volume usage	`kubectl get pods -A -o json \	jq '.items[].spec.volumes[]? \
2	Portworx in-tree driver usage	{% raw %}`kubectl get pv -o json \	jq '.items[] \
3	externalIPs Services	{% raw %}`kubectl get svc -A -o json \	jq '.items[] \
4	Non-canonical IP/CIDR formats	Audit IaC tool IP generation logic	Warning
5	Audit log configuration review	Check {% raw %}`--audit-log-maxsize`, `--audit-log-maxage` flags	Low
6	Flex Volume usage (kubeadm)	`find /usr/libexec/kubernetes/kubelet-plugins -type f`	Warning
7	SELinux policy compatibility	Verify `seLinuxChangePolicy` in mixed privileged/unprivileged Pod environments	Medium

# Scan all major risk factors at once
echo "=== gitRepo Volume Scan ==="
kubectl get pods -A -o json | jq -r '
  .items[] |
  select(.spec.volumes[]?.gitRepo != null) |
  "\(.metadata.namespace)/\(.metadata.name)"
' 2>/dev/null || echo "No gitRepo volumes found ✅"

echo ""
echo "=== Portworx In-tree PV Scan ==="
kubectl get pv -o json | jq -r '
  .items[] |
  select(.spec.portworxVolume != null) |
  .metadata.name
' 2>/dev/null || echo "No Portworx in-tree PVs found ✅"

echo ""
echo "=== externalIPs Service Scan ==="
kubectl get svc -A -o json | jq -r '
  .items[] |
  select(.spec.externalIPs != null and (.spec.externalIPs | length > 0)) |
  "\(.metadata.namespace)/\(.metadata.name): \(.spec.externalIPs)"
' 2>/dev/null || echo "No externalIPs Services found ✅"

8. Conclusion — What 1.36 Means and How to Prepare

Kubernetes 1.36 delivers the most meaningful progress across two axes: native AI workload support and security streamlining. DRA's GA graduation moves GPU resource scheduling from Device Plugin's integer-based constraints to attribute-based declarative scheduling. OCI VolumeSource GA decouples ML model weight deployment from container image builds, improving operational agility. MutatingAdmissionPolicy GA removes the operational burden of webhook servers, enabling policy management through declarations rather than code.

Ahead of the April 22 official release, we recommend testing the RC (Release Candidate) in staging environments first and using the upgrade checklist above to pre-audit gitRepo, Portworx, and externalIPs usage. If you're running AI/ML workloads, consider piloting DRA and OCI VolumeSource together.

References: Kubernetes v1.36 Sneak Peek (Official Blog), CHANGELOG-1.36.md (GitHub), Kubernetes 1.36: GA Features, Removals & Upgrade Guide (Kloia)

Originally published at ManoIT Tech Blog.

DEV Community