Running LLMs locally with Ollama is excitingโฆ until you realize everything is running on CPU ๐
I recently ran into this exact issue โ models were working, but GPU wasnโt being used at all.
Hereโs how I fixed it using Docker Desktop with GPU support, along with the debugging steps that helped me understand the real problem.
๐ด The Problem
My initial setup:
- Ollama installed locally โ
- Models running successfully โ
- GPU usage โ
Result:
- Slow responses
- High CPU usage
- Poor performance
๐ง Root Cause
After debugging, I realized:
The issue wasnโt entirely Ollama โ it was how my local environment handled GPU access.
Even though the GPU was available, it wasnโt properly exposed to Ollama in my local setup.
However, the same GPU worked perfectly inside Docker, which confirmed that the environment played a major role.
๐ข The Solution: Docker Desktop + GPU
Instead of continuing to debug locally, I moved Ollama into a Docker container with GPU enabled.
This approach turned out to be much simpler and more reliable.
โ๏ธ Prerequisites
Make sure you have:
- Docker Desktop installed
- NVIDIA GPU (RTX / GTX)
- Latest NVIDIA drivers
- WSL2 enabled (for Windows users)
โ Step 1: Verify GPU Access in Docker (Critical Step)
Before running Ollama, verify that Docker can access your GPU:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
๐ If this command shows GPU details, your setup is correctly configured.
๐ณ Step 2: Run Ollama with GPU
docker run -d \
--gpus all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
โก Step 3: Run a Model
Option 1: Inside Container
docker exec -it ollama ollama run llama3.2:1b
Option 2: Using API (Recommended)
curl --location 'http://localhost:11434/api/generate' \
--header 'Content-Type: text/plain' \
--data '{
"model": "llama3.2:1b",
"prompt": "Hello"
}'
๐ Step 4: Confirm GPU Usage
Open another terminal and run:
nvidia-smi
๐ You should see GPU memory usage increasing while the model is running.
โก Results
After switching to Docker:
- ๐ Faster inference
- ๐ฅ GPU utilization working
- ๐ง Smooth local LLM experience
๐ ๏ธ Troubleshooting Guide
Here are some real issues I encountered:
โ GPU Not Working in Docker
โ Checklist:
-
nvidia-smiworks on host - Docker Desktop is updated
- WSL2 is enabled
- Container is started with
--gpus all
โ GPU Works in Docker but Not in Ollama
โ Fix:
- Restart container
- Re-run with GPU flag
- Try a smaller model (e.g., mistral)
โ nvidia-smi Not Found
Cause:
NVIDIA drivers are not installed
Fix:
Install latest drivers and reboot system
๐ก Key Takeaways
- GPU issues are often environment-related
- Always verify GPU using a CUDA container first
- Docker Desktop simplifies GPU access significantly
- Running LLMs with GPU drastically improves performance
๐ Final Thoughts
If your Ollama setup is stuck on CPU:
๐ Donโt spend too much time debugging locally
๐ Try Docker with GPU support
Itโs simple, reliable, and works consistently.
๐ Need Help?
If you get stuck at any step, feel free to reach out โ happy to help!
Top comments (0)