As larger graphical-based workloads (like building Models) becomes more of a need for organizations, the ability to use GPUs is increasing. The problem is that it’s incredibly expensive. For example, if you want to get a few Nvidia H100’s in a Kubernetes cluster in the cloud, you’re looking at hundreds of thousands of dollars in cost.
With GPU sharing, that cost and overall management of infrastructure decreases significantly.
In this blog post, you’ll learn how to implement sharing of Nvidia GPUs.
The Benefit
As it stands right now, GPUs are incredibly expensive. If you look at a lot of the large “AI Factories” out of places like OpenAI, they’re spending billions of dollars just to have enough hardware to run Models.
There are two big things to think about here:
- That’s a lot of hardware/infrastructure to manage.
- It’s very expensive.
Having the ability to share one GPU, much like memory/CPU/Worker Nodes are shared in a Kubernetes cluster for multiple Pods, allows organizations to not only take advantage of using GPUs, but it keeps costs down and makes graphical-based workloads more readily available.
Nvidia Operator Deployment
The Nvidia Operator helps ease the pain between getting communication of hardware and software, much like any other Driver. Even if a desktop or a laptop, a Driver is the software that allows the hardware to be available/communicate with the shell.
As all Kubernetes Operators do, the Nvidia GPU Operator also extends Kubernetes to be able to use the GPU itself.
💡
Operators allow you to extend the Kubernetes API to work with an API of your choosing. They also contain a Controller which ensures that the current state of the deployment is the desired state. For example, if you have a Controller that’s looking at Pods, it’ll ensure that however many Replicas are supposed to be deployed are in fact, deployed.
First, install the Driver.
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
Next, create the gpu-operator
Namespace as the Helm Chart for the Kubernetes Nvidia Operator will exist there.
kubectl create ns gpu-operator
Because it’s a GPU Operator and can take a significant amount of resources with a Kubernetes cluster, set up a Resource Quota to ensure that no more than 100 Pods can use the GPU at once.
kubectl apply -f - << EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: gpu-operator-quota
namespace: gpu-operator
spec:
hard:
pods: 100
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- system-node-critical
- system-cluster-critical
EOF
Install the Nvidia GPU Operator via Helm.
helm install --wait gpu-operator \
-n gpu-operator \
nvidia/gpu-operator \
--set hostPaths.driverInstallDir=/home/kubernetes/bin/nvidia \
--set toolkit.installDir=/home/kubernetes/bin/nvidia \
--set cdi.enabled=true \
--set cdi.default=true \
--set driver.enabled=false
Wait 2-4 minutes and then check to ensure that the Pods within the GPU Operator are up and operational.
kubectl get pods -n gpu-operator
Once the Pods are up, confirm the GPU works by deploying a test Nvidia Pod.
kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: cuda-vectoradd
spec:
restartPolicy: OnFailure
containers:
- name: vectoradd
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
resources:
limits:
nvidia.com/gpu: 1
EOF
You should see an output similiar to the one below.
Next you’ll learn how to slice a GPU, which means one GPU is able to be used for multiple Pods.
GPU Slicing
When you hear “slicing”, it’s a method of taking one GPU and allowing it to be used across more than one Pod.
💡
There’s also a method called MPS, but it seems like slicing is used the most right now.
The first thing you’ll do is set up a Config Map for the slicing.
Once thing to point out is the replica count. Notice how the replica count currently says 4
? That means four (4) Pods can use the GPU. If you bumped it up to ten, that means ten (10) Pods could share the GPU. This of course depends on the type of GPU and if the resources are available like any other piece of hardware.
kubectl apply -f - << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: plugin-config
namespace: gpu-operator
data:
time-slicing: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
renameByDefault: false
resources:
- name: nvidia.com/gpu
replicas: 4
mps: |-
version: v1
flags:
migStrategy: none
sharing:
mps:
renameByDefault: false
resources:
- name: nvidia.com/gpu
replicas: 4
EOF
Next, patch the cluster policy to have the ability to implement Nvidia’s time slicing.
cat > patch.yaml << EOF
spec:
devicePlugin:
config:
name: plugin-config
default: time-slicing
EOF
kubectl patch clusterpolicies.nvidia.com/cluster-policy --type=merge --patch-file=patch.yaml
Run the following command to confirm that the GPU is shared.
kubectl describe node $GPU_NODE_NAME | grep "Allocatable:" -A7
You can now use a Job to test out that sharing/slicing the GPU worked.
💡
You can also use a Pod or whatever else you’d like. This is just an example.
kubectl apply -f - << EOF
apiVersion: batch/v1
kind: Job
metadata:
name: dcgm-prof-tester
spec:
parallelism: 4
template:
metadata:
labels:
app: dcgm-prof-tester
spec:
restartPolicy: OnFailure
containers:
- name: dcgmproftester12
image: nvcr.io/nvidia/cloud-native/dcgm:3.3.8-1-ubuntu22.04
command: ["/usr/bin/dcgmproftester12"]
args: ["--no-dcgm-validation", "-t 1004", "-d 30"]
resources:
limits:
nvidia.com/gpu: 1
securityContext:
capabilities:
add: ["SYS_ADMIN"]
EOF
Congrats! You’ve successfully not only deployed workloads on Kubernetes that use GPUs, but shared the GPU across Pods.
Top comments (0)