This stack uses Ollama with Gemma 4 QAT to run a 12B model on a 10GB VRAM laptop GPU. The latest Gemma 4 QAT checkpoints reduce memory usage and enable compact local inference.
What you get
- Local Gemma 4 12B inference on 10GB VRAM hardware
- QAT compression that fits the model into ~6.7 GB VRAM
- A laptop-friendly private AI stack for writing, notes, and prompts
Prerequisites
- A laptop with at least 10 GB VRAM, such as RX 6700 series
- Latest GPU drivers and Vulkan support
- Ollama installed locally
- Enough disk space for the model cache (~40 GB)
Setup
brew install ollama
ollama pull gemma-4:12b --quantization qat
ollama serve
ollama ps
If ollama ps shows the model and GPU usage, your stack is ready.
Use it
- Personal writing with faster local completion
- Private research without sending queries to the cloud
- Compact local AI demos on 10GB-class laptops
Troubleshooting
- Model wonβt load: verify Vulkan and free VRAM.
- Ollama falls back to CPU: check
ollama psand update drivers. - Slow inference: close background apps and use the QAT model.
Originally published on https://everylocalai.com/stack/gemma-4-qat-10gb-laptop
Top comments (0)