Kevin Naidoo

Posted on Nov 18, 2023 • Updated on Dec 3, 2023 • Originally published at kevincoder.co.za

How to install NVIDIA drivers for machine learning on Ubuntu

#machinelearning #ubuntu #linux #devops

A common pain point for setting up servers to run AI models - is getting the NVIDIA drivers to work correctly with Pytorch and other machine-learning libraries.

In this guide, I will walk you through some installation steps you need to run to get your GPU working correctly with your AI models.

I am running Ubuntu 22.04. If you are running a different version - you may need to tweak the CUDA toolkit version to suit your Distro.

Install docker and some essential apt packages

sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" -y
sudo apt install docker-ce -y

# Now add your user to the docker group
# You will need to logout and back in again - for this to take effect
sudo groupadd docker
sudo usermod -aG docker yourusername

Setup NVIDIA GPU drivers

sudo add-apt-repository ppa:graphics-drivers/ppa --yes
sudo apt update -y
sudo apt-get install linux-headers-$(uname -r)
sudo ubuntu-drivers install --gpgpu

Setup CUDA

sudo apt-get update -y

# You can find the right key to use for your distro here:
# https://developer.download.nvidia.com/compute/cuda/repos/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get -y install cuda-toolkit-12-

Configure docker to use the GPU and Cuda


curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -y

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker

Conclusion

GPU setups can be tricky and painful, hopefully, this goes a long way in getting you up and running.

Now you should be able to run any of your Pytorch or machine learning models on the GPU, either natively on the machine or using docker.

Top comments (3)

Elliot Hallmark • Jan 14

I created an account to say this was the perfect post. (You did drop the 2 at the end of the line sudo apt-get -y install cuda-toolkit-12-). Setting up drivers and cuda was such a pain the last time I installed Ubuntu. These instructions got me there in just a couple minutes. Thank you!

Kevin Naidoo • Jan 15

Awesome! Thanks for the feedback and glad this is working for you. The "-" at the end is on purpose, it will install the latest minor versions but you can also specify an exact version as well.

Elliot Hallmark • Apr 5

I have the issue that my kernel keeps getting automatically upgraded and then the nvidia driver stops being recognized. I've had to fix it several times and I never remember how. Would be great if you updated these instructions to include preventing the driver from breaking

DEV Community

How to install NVIDIA drivers for machine learning on Ubuntu

Install docker and some essential apt packages

Setup NVIDIA GPU drivers

Setup CUDA

Configure docker to use the GPU and Cuda

Conclusion

Top comments (3)

Read next

Immutable Infrastructure and GitOps

Modern Data Quality: Navigating the Landscape

Active Directory (AD) vs Azure Active Directory (AAD)

Dockerizing AdonisJs (v6) with MySQL