DEV Community

Cover image for πŸš€ Fixing Ollama Not Using GPU with Docker Desktop (Step-by-Step + Troubleshooting)
Foram Jaguwala
Foram Jaguwala

Posted on

πŸš€ Fixing Ollama Not Using GPU with Docker Desktop (Step-by-Step + Troubleshooting)

Running LLMs locally with Ollama is exciting… until you realize everything is running on CPU πŸ˜…

I recently ran into this exact issue β€” models were working, but GPU wasn’t being used at all.

Here’s how I fixed it using Docker Desktop with GPU support, along with the debugging steps that helped me understand the real problem.


πŸ”΄ The Problem

My initial setup:

  • Ollama installed locally βœ…
  • Models running successfully βœ…
  • GPU usage ❌

Result:

  • Slow responses
  • High CPU usage
  • Poor performance

🧠 Root Cause

After debugging, I realized:

The issue wasn’t entirely Ollama β€” it was how my local environment handled GPU access.

Even though the GPU was available, it wasn’t properly exposed to Ollama in my local setup.

However, the same GPU worked perfectly inside Docker, which confirmed that the environment played a major role.


🟒 The Solution: Docker Desktop + GPU

Instead of continuing to debug locally, I moved Ollama into a Docker container with GPU enabled.

This approach turned out to be much simpler and more reliable.


βš™οΈ Prerequisites

Make sure you have:

  • Docker Desktop installed
  • NVIDIA GPU (RTX / GTX)
  • Latest NVIDIA drivers
  • WSL2 enabled (for Windows users)

βœ… Step 1: Verify GPU Access in Docker (Critical Step)

Before running Ollama, verify that Docker can access your GPU:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ If this command shows GPU details, your setup is correctly configured.


🐳 Step 2: Run Ollama with GPU

docker run -d \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama
Enter fullscreen mode Exit fullscreen mode

⚑ Step 3: Run a Model

Option 1: Inside Container

docker exec -it ollama ollama run llama3.2:1b
Enter fullscreen mode Exit fullscreen mode

Option 2: Using API (Recommended)

curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: text/plain' \
--data '{
  "model": "llama3.2:1b",
  "prompt": "Hello"
}'
Enter fullscreen mode Exit fullscreen mode

πŸ” Step 4: Confirm GPU Usage

Open another terminal and run:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

πŸ‘‰ You should see GPU memory usage increasing while the model is running.


⚑ Results

After switching to Docker:

  • πŸš€ Faster inference
  • πŸ”₯ GPU utilization working
  • 🧠 Smooth local LLM experience

πŸ› οΈ Troubleshooting Guide

Here are some real issues I encountered:


❌ GPU Not Working in Docker

βœ… Checklist:

  • nvidia-smi works on host
  • Docker Desktop is updated
  • WSL2 is enabled
  • Container is started with --gpus all

❌ GPU Works in Docker but Not in Ollama

βœ… Fix:

  • Restart container
  • Re-run with GPU flag
  • Try a smaller model (e.g., mistral)

❌ nvidia-smi Not Found

Cause:

NVIDIA drivers are not installed

Fix:

Install latest drivers and reboot system


πŸ’‘ Key Takeaways

  • GPU issues are often environment-related
  • Always verify GPU using a CUDA container first
  • Docker Desktop simplifies GPU access significantly
  • Running LLMs with GPU drastically improves performance

πŸ“Œ Final Thoughts

If your Ollama setup is stuck on CPU:

πŸ‘‰ Don’t spend too much time debugging locally
πŸ‘‰ Try Docker with GPU support

It’s simple, reliable, and works consistently.


πŸ™Œ Need Help?

If you get stuck at any step, feel free to reach out β€” happy to help!

Top comments (0)