Deploying large language models (LLMs) or generative AI on a bare-metal dedicated server gives you unmatched performance, zero virtualization overhead, and complete data privacy. However, out of the box, Docker containers are isolated from your host machine's physical hardware.
If you run a standard AI container, it simply cannot see your RTX 4090 or A100 GPU.
To break this isolation and achieve true Docker GPU passthrough, you need to bridge your container engine with your host’s hardware using the NVIDIA Container Toolkit.
In this guide, backed by our experience deploying thousands of AI-ready bare metal servers at GPUYard, we will walk you through the exact steps to securely configure Docker with NVIDIA GPUs on Ubuntu 24.04.
Prerequisites: The Bare Metal Foundation
Before configuring Docker, your server must recognize its hardware. At GPUYard, our bare-metal servers come pre-provisioned, but you should always verify your host environment:
- A Dedicated GPU Server: Running Ubuntu 24.04 LTS.
- Root or Sudo Access: Required for package installation.
-
NVIDIA Drivers Installed: Verify this by running
nvidia-smiin your terminal. You should see a table displaying your GPU model and CUDA version. (If you see "command not found," install the proprietary NVIDIA drivers first).
Step 1: Avoid the Ubuntu 24.04 Docker "Snap" Trap 🛑
The most common reason developers fail to pass GPUs into Docker on Ubuntu 24.04 is the default installation method. If you installed Docker via the Ubuntu App Center or used snap install docker, GPU passthrough will fail with permission errors.
Snap packages use strict AppArmor confinement, preventing Docker from accessing the /dev/nvidia* hardware files on your host. We must remove the Snap version and use the official Docker APT repository.
Purge the Snap version:
sudo snap remove --purge docker
sudo apt-get remove docker docker-engine docker.io containerd runc
Step 2: Install the Official Docker Engine
Now, install the unconfined, official Docker Engine directly from Docker’s verified repository.
Set up the repository and GPG keys:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL [https://download.docker.com/linux/ubuntu/gpg](https://download.docker.com/linux/ubuntu/gpg) -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] [https://download.docker.com/linux/ubuntu](https://download.docker.com/linux/ubuntu) \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Install Docker:
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
Step 3: Install the NVIDIA Container Toolkit
With a clean Docker engine running, we install the NVIDIA Container Toolkit. This software acts as the critical translation layer between your bare-metal CUDA drivers and your isolated containers.
Add NVIDIA's production repository and install the toolkit:
curl -fsSL [https://nvidia.github.io/libnvidia-container/gpgkey](https://nvidia.github.io/libnvidia-container/gpgkey) | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L [https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list](https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list) | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Step 4: Configure the Docker Runtime (daemon.json)
The toolkit is installed, but Docker needs to be explicitly instructed to use it. We will use the nvidia-ctk command-line utility to automatically inject the NVIDIA runtime into Docker's configuration file.
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Expert Tip: You can verify this worked by running cat /etc/docker/daemon.json. You will see "nvidia" listed under the "runtimes" key.
Step 5: The Bare Metal Verification Test
Let's prove the isolation barrier is broken. We will spin up an official NVIDIA CUDA container and ask it to read our bare-metal hardware.
sudo docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubuntu24.04 nvidia-smi
If successful, the terminal will output your GPU statistics table. Because we used the --gpus all flag, this output proves that your Docker container now has direct, unrestricted access to your physical GPU! 🎉
Bonus: Deploying AI with Docker Compose
Running terminal commands is great for testing, but deploying production AI models (like Llama 3 or Stable Diffusion) requires docker-compose.yml. You must use the specific deploy specification to reserve GPU hardware.
Here is a template to deploy Ollama with full bare-metal GPU acceleration:
YAML
services:
ollama-ai:
image: ollama/ollama:latest
container_name: gpuyard-ollama
restart: always
ports:
- "11434:11434"
volumes:
- ./ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all # Passes all available GPUs to the container
capabilities: [gpu]
Save this as docker-compose.yml and run sudo docker compose up -d. You are now hosting your own private AI.
Troubleshooting Common Errors
Even on standard Ubuntu 24.04 setups, you might encounter these snags:
-
Error: "could not select device driver with capabilities: [[gpu]]"
- Cause: Docker isn't aware of the NVIDIA runtime.
- Fix: You likely forgot to restart the Docker daemon in Step 4. Run sudo systemctl restart docker.
-
Error: "Failed to initialize NVML: Driver/library version mismatch"
- Cause: Your host system updated the NVIDIA Linux kernel drivers in the background, but the old driver is still loaded in memory.
- Fix: A simple bare-metal server reboot (sudo reboot) will align the kernel modules.
Scale Your AI Infrastructure with GPUYard
Setting up the software is only half the battle; having the right hardware is what dictates your AI's performance. Cloud VPS environments throttle your VRAM and share your PCI-e lanes.
If you want maximum token-per-second generation and uncompromising privacy, you need Bare Metal. Explore GPUYard’s high-performance Dedicated GPU Servers—custom-built for seamless Docker deployments and heavy AI workloads.
Top comments (0)