DEV Community

Rost
Rost

Posted on • Originally published at glukhov.org

Adding NVIDIA GPU Support to Docker Model Runner

Docker Model Runner is Docker's official tool for running AI models locally, but
enabling NVidia GPU acceleration in Docker Model Runner
requires specific configuration.

Unlike standard docker run commands, docker model run doesn't support --gpus or -e flags, so GPU support must be configured at the Docker daemon level and during runner installation.

If you're looking for an alternative LLM hosting solution with easier GPU configuration, consider Ollama, which has built-in GPU support and simpler setup. However, Docker Model Runner offers better integration with Docker's ecosystem and OCI artifact distribution.

This nice image is generated by AI model Flux 1 dev.

Prerequisites

Before configuring GPU support, ensure you have:

Verify your GPU is accessible:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Test Docker GPU access:

docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubi8 nvidia-smi
Enter fullscreen mode Exit fullscreen mode

For more Docker commands and configuration options, see our Docker Cheatsheet.

Step 1: Configure Docker Daemon for NVIDIA Runtime

Docker Model Runner requires the NVIDIA runtime to be set as the default runtime in the Docker daemon configuration.

Find NVIDIA Container Runtime Path

First, locate where nvidia-container-runtime is installed:

which nvidia-container-runtime
Enter fullscreen mode Exit fullscreen mode

This typically outputs /usr/bin/nvidia-container-runtime. Note this path for the next step.

Configure Docker Daemon

Create or update /etc/docker/daemon.json to set NVIDIA as the default runtime:

sudo tee /etc/docker/daemon.json > /dev/null << 'EOF'
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
EOF
Enter fullscreen mode Exit fullscreen mode

Important: If which nvidia-container-runtime returned a different path, update the "path" value in the JSON configuration accordingly.

Restart Docker Service

Apply the configuration by restarting Docker:

sudo systemctl restart docker
Enter fullscreen mode Exit fullscreen mode

Verify Configuration

Confirm the NVIDIA runtime is configured:

docker info | grep -i runtime
Enter fullscreen mode Exit fullscreen mode

You should see Default Runtime: nvidia in the output.

Step 2: Install Docker Model Runner with GPU Support

Docker Model Runner must be installed or reinstalled with explicit GPU support. The runner container itself needs to be the CUDA-enabled version.

Stop Current Runner (if running)

If Docker Model Runner is already installed, stop it first:

docker model stop-runner
Enter fullscreen mode Exit fullscreen mode

Install/Reinstall with CUDA Support

Install or reinstall Docker Model Runner with CUDA GPU support:

docker model reinstall-runner --gpu cuda
Enter fullscreen mode Exit fullscreen mode

This command:

  • Pulls the CUDA-enabled version (docker/model-runner:latest-cuda) instead of the CPU-only version
  • Configures the runner container to use NVIDIA runtime
  • Enables GPU acceleration for all models

Note: If you've already installed Docker Model Runner without GPU support, you must reinstall it with the --gpu cuda flag. Simply configuring the Docker daemon is not enough—the runner container itself needs to be the CUDA-enabled version.

Available GPU Backends

Docker Model Runner supports multiple GPU backends:

  • cuda - NVIDIA CUDA (most common for NVIDIA GPUs)
  • rocm - AMD ROCm (for AMD GPUs)
  • musa - Moore Threads MUSA
  • cann - Huawei CANN
  • auto - Automatic detection (default, may not work correctly)
  • none - CPU only

For NVIDIA GPUs, always use --gpu cuda explicitly.

Step 3: Verify GPU Access

After installation, verify that Docker Model Runner can access your GPU.

Check Runner Container GPU Access

Test GPU access from within the Docker Model Runner container:

docker exec docker-model-runner nvidia-smi
Enter fullscreen mode Exit fullscreen mode

This should display your GPU information, confirming the container has GPU access.

Check Runner Status

Verify Docker Model Runner is running:

docker model status
Enter fullscreen mode Exit fullscreen mode

You should see the runner is active with llama.cpp support.

Step 4: Test Model with GPU

Run a model and verify it's using the GPU.

Run a Model

Start a model inference:

docker model run ai/qwen3:14B-Q6_K "who are you?"
Enter fullscreen mode Exit fullscreen mode

Verify GPU Usage in Logs

Check the Docker Model Runner logs for GPU confirmation:

docker model logs | grep -i cuda
Enter fullscreen mode Exit fullscreen mode

You should see messages indicating GPU usage:

  • using device CUDA0 (NVIDIA GeForce RTX 4080) - GPU device detected
  • offloaded 41/41 layers to GPU - Model layers loaded on GPU
  • CUDA0 model buffer size = 10946.13 MiB - GPU memory allocation
  • CUDA0 KV buffer size = 640.00 MiB - Key-value cache on GPU
  • CUDA0 compute buffer size = 306.75 MiB - Compute buffer on GPU

Monitor GPU Usage

In another terminal, monitor GPU usage in real-time:

nvidia-smi -l 1
Enter fullscreen mode Exit fullscreen mode

You should see GPU memory usage and utilization increase when the model is running.

For more advanced GPU monitoring options and tools, see our guide on GPU monitoring applications in Linux / Ubuntu.

Troubleshooting

Model Still Using CPU

If the model is still running on CPU:

  1. Verify Docker daemon configuration:
   docker info | grep -i runtime
Enter fullscreen mode Exit fullscreen mode

Should show Default Runtime: nvidia

  1. Check runner container runtime:
   docker inspect docker-model-runner | grep -A 2 '"Runtime"'
Enter fullscreen mode Exit fullscreen mode

Should show "Runtime": "nvidia"

  1. Reinstall runner with GPU support:
   docker model reinstall-runner --gpu cuda
Enter fullscreen mode Exit fullscreen mode
  1. Check logs for errors:
   docker model logs | tail -50
Enter fullscreen mode Exit fullscreen mode

GPU Not Detected

If GPU is not detected:

  1. Verify NVIDIA Container Toolkit is installed:
   dpkg -l | grep nvidia-container-toolkit
Enter fullscreen mode Exit fullscreen mode
  1. Test GPU access with standard Docker:
   docker run --rm --gpus all nvidia/cuda:12.2.2-base-ubi8 nvidia-smi
Enter fullscreen mode Exit fullscreen mode

For troubleshooting Docker issues, refer to our Docker Cheatsheet.

  1. Check NVIDIA drivers:
   nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Performance Issues

If GPU performance is poor:

  1. Check GPU utilization:
   nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Look for high GPU utilization percentage

  1. Verify model layers are on GPU:
   docker model logs | grep "offloaded.*layers to GPU"
Enter fullscreen mode Exit fullscreen mode

All layers should be offloaded to GPU

  1. Check for memory issues:
   nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Ensure GPU memory isn't exhausted

Best Practices

  1. Always specify GPU backend explicitly: Use --gpu cuda instead of --gpu auto for NVIDIA GPUs to ensure correct configuration.

  2. Verify configuration after changes: Always check docker info | grep -i runtime after modifying Docker daemon settings.

  3. Monitor GPU usage: Use nvidia-smi to monitor GPU memory and utilization during model inference. For more advanced monitoring tools, see our guide on GPU monitoring applications in Linux / Ubuntu.

  4. Check logs regularly: Review docker model logs to ensure models are using GPU acceleration.

  5. Use appropriate model sizes: Ensure your GPU has sufficient memory for the model. Use quantized models (Q4, Q5, Q6, Q8) for better GPU memory efficiency. For help choosing the right GPU for your AI workloads, see our guide on Comparing NVidia GPU specs suitability for AI.

Useful Links

Top comments (0)