DEV Community

Kevin Naidoo
Kevin Naidoo

Posted on • Updated on • Originally published at kevincoder.co.za

How to install NVIDIA drivers for machine learning on Ubuntu

A common pain point for setting up servers to run AI models - is getting the NVIDIA drivers to work correctly with Pytorch and other machine-learning libraries.

In this guide, I will walk you through some installation steps you need to run to get your GPU working correctly with your AI models.

I am running Ubuntu 22.04. If you are running a different version - you may need to tweak the CUDA toolkit version to suit your Distro.

Install docker and some essential apt packages

sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" -y
sudo apt install docker-ce -y

# Now add your user to the docker group
# You will need to logout and back in again - for this to take effect
sudo groupadd docker
sudo usermod -aG docker yourusername
Enter fullscreen mode Exit fullscreen mode

Setup NVIDIA GPU drivers

sudo add-apt-repository ppa:graphics-drivers/ppa --yes
sudo apt update -y
sudo apt-get install linux-headers-$(uname -r)
sudo ubuntu-drivers install --gpgpu
Enter fullscreen mode Exit fullscreen mode

Setup CUDA

sudo apt-get update -y

# You can find the right key to use for your distro here:
# https://developer.download.nvidia.com/compute/cuda/repos/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get -y install cuda-toolkit-12-
Enter fullscreen mode Exit fullscreen mode

Configure docker to use the GPU and Cuda


curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -y

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
Enter fullscreen mode Exit fullscreen mode

Conclusion

GPU setups can be tricky and painful, hopefully, this goes a long way in getting you up and running.

Now you should be able to run any of your Pytorch or machine learning models on the GPU, either natively on the machine or using docker.

Top comments (3)

Collapse
 
permafacture profile image
Elliot Hallmark

I created an account to say this was the perfect post. (You did drop the 2 at the end of the line sudo apt-get -y install cuda-toolkit-12-). Setting up drivers and cuda was such a pain the last time I installed Ubuntu. These instructions got me there in just a couple minutes. Thank you!

Collapse
 
kwnaidoo profile image
Kevin Naidoo

Awesome! Thanks for the feedback and glad this is working for you. The "-" at the end is on purpose, it will install the latest minor versions but you can also specify an exact version as well.

Collapse
 
permafacture profile image
Elliot Hallmark

I have the issue that my kernel keeps getting automatically upgraded and then the nvidia driver stops being recognized. I've had to fix it several times and I never remember how. Would be great if you updated these instructions to include preventing the driver from breaking