A common pain point for setting up servers to run AI models - is getting the NVIDIA drivers to work correctly with Pytorch and other machine-learning libraries.
In this guide, I will walk you through some installation steps you need to run to get your GPU working correctly with your AI models.
I am running Ubuntu 22.04. If you are running a different version - you may need to tweak the CUDA toolkit version to suit your Distro.
Install docker and some essential apt packages
sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" -y
sudo apt install docker-ce -y
# Now add your user to the docker group
# You will need to logout and back in again - for this to take effect
sudo groupadd docker
sudo usermod -aG docker yourusername
Setup NVIDIA GPU drivers
sudo add-apt-repository ppa:graphics-drivers/ppa --yes
sudo apt update -y
sudo apt-get install linux-headers-$(uname -r)
sudo ubuntu-drivers install --gpgpu
Setup CUDA
sudo apt-get update -y
# You can find the right key to use for your distro here:
# https://developer.download.nvidia.com/compute/cuda/repos/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get -y install cuda-toolkit-12-
Configure docker to use the GPU and Cuda
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update -y
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
Conclusion
GPU setups can be tricky and painful, hopefully, this goes a long way in getting you up and running.
Now you should be able to run any of your Pytorch or machine learning models on the GPU, either natively on the machine or using docker.
Top comments (3)
I created an account to say this was the perfect post. (You did drop the 2 at the end of the line
sudo apt-get -y install cuda-toolkit-12-
). Setting up drivers and cuda was such a pain the last time I installed Ubuntu. These instructions got me there in just a couple minutes. Thank you!Awesome! Thanks for the feedback and glad this is working for you. The "-" at the end is on purpose, it will install the latest minor versions but you can also specify an exact version as well.
I have the issue that my kernel keeps getting automatically upgraded and then the nvidia driver stops being recognized. I've had to fix it several times and I never remember how. Would be great if you updated these instructions to include preventing the driver from breaking