DEV Community

Cover image for ๐Ÿš€ Fixing Ollama Not Using GPU with Docker Desktop (Step-by-Step + Troubleshooting)
Foram Jaguwala
Foram Jaguwala

Posted on

๐Ÿš€ Fixing Ollama Not Using GPU with Docker Desktop (Step-by-Step + Troubleshooting)

Running LLMs locally with Ollama is excitingโ€ฆ until you realize everything is running on CPU ๐Ÿ˜…

I recently ran into this exact issue โ€” models were working, but GPU wasnโ€™t being used at all.

Hereโ€™s how I fixed it using Docker Desktop with GPU support, along with the debugging steps that helped me understand the real problem.


๐Ÿ”ด The Problem

My initial setup:

  • Ollama installed locally โœ…
  • Models running successfully โœ…
  • GPU usage โŒ

Result:

  • Slow responses
  • High CPU usage
  • Poor performance

๐Ÿง  Root Cause

After debugging, I realized:

The issue wasnโ€™t entirely Ollama โ€” it was how my local environment handled GPU access.

Even though the GPU was available, it wasnโ€™t properly exposed to Ollama in my local setup.

However, the same GPU worked perfectly inside Docker, which confirmed that the environment played a major role.


๐ŸŸข The Solution: Docker Desktop + GPU

Instead of continuing to debug locally, I moved Ollama into a Docker container with GPU enabled.

This approach turned out to be much simpler and more reliable.


โš™๏ธ Prerequisites

Make sure you have:

  • Docker Desktop installed
  • NVIDIA GPU (RTX / GTX)
  • Latest NVIDIA drivers
  • WSL2 enabled (for Windows users)

โœ… Step 1: Verify GPU Access in Docker (Critical Step)

Before running Ollama, verify that Docker can access your GPU:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Enter fullscreen mode Exit fullscreen mode

๐Ÿ‘‰ If this command shows GPU details, your setup is correctly configured.


๐Ÿณ Step 2: Run Ollama with GPU

docker run -d \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama
Enter fullscreen mode Exit fullscreen mode

โšก Step 3: Run a Model

Option 1: Inside Container

docker exec -it ollama ollama run llama3.2:1b
Enter fullscreen mode Exit fullscreen mode

Option 2: Using API (Recommended)

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: text/plain' \
--data '{
  "model": "llama3.2:1b",
  "prompt": "Hello"
}'
Enter fullscreen mode Exit fullscreen mode

๐Ÿ” Step 4: Confirm GPU Usage

Open another terminal and run:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

๐Ÿ‘‰ You should see GPU memory usage increasing while the model is running.


โšก Results

After switching to Docker:

  • ๐Ÿš€ Faster inference
  • ๐Ÿ”ฅ GPU utilization working
  • ๐Ÿง  Smooth local LLM experience

๐Ÿ› ๏ธ Troubleshooting Guide

Here are some real issues I encountered:


โŒ GPU Not Working in Docker

โœ… Checklist:

  • nvidia-smi works on host
  • Docker Desktop is updated
  • WSL2 is enabled
  • Container is started with --gpus all

โŒ GPU Works in Docker but Not in Ollama

โœ… Fix:

  • Restart container
  • Re-run with GPU flag
  • Try a smaller model (e.g., mistral)

โŒ nvidia-smi Not Found

Cause:

NVIDIA drivers are not installed

Fix:

Install latest drivers and reboot system


๐Ÿ’ก Key Takeaways

  • GPU issues are often environment-related
  • Always verify GPU using a CUDA container first
  • Docker Desktop simplifies GPU access significantly
  • Running LLMs with GPU drastically improves performance

๐Ÿ“Œ Final Thoughts

If your Ollama setup is stuck on CPU:

๐Ÿ‘‰ Donโ€™t spend too much time debugging locally
๐Ÿ‘‰ Try Docker with GPU support

Itโ€™s simple, reliable, and works consistently.


๐Ÿ™Œ Need Help?

If you get stuck at any step, feel free to reach out โ€” happy to help!

Top comments (0)