DEV Community

EveryLocalAI
EveryLocalAI

Posted on

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

This stack uses Ollama with Gemma 4 QAT to run a 12B model on a 10GB VRAM laptop GPU. The latest Gemma 4 QAT checkpoints reduce memory usage and enable compact local inference.

What you get

  • Local Gemma 4 12B inference on 10GB VRAM hardware
  • QAT compression that fits the model into ~6.7 GB VRAM
  • A laptop-friendly private AI stack for writing, notes, and prompts

Prerequisites

  • A laptop with at least 10 GB VRAM, such as RX 6700 series
  • Latest GPU drivers and Vulkan support
  • Ollama installed locally
  • Enough disk space for the model cache (~40 GB)

Setup

brew install ollama
ollama pull gemma-4:12b --quantization qat
ollama serve
ollama ps
Enter fullscreen mode Exit fullscreen mode

If ollama ps shows the model and GPU usage, your stack is ready.

Use it

  • Personal writing with faster local completion
  • Private research without sending queries to the cloud
  • Compact local AI demos on 10GB-class laptops

Troubleshooting

  • Model won’t load: verify Vulkan and free VRAM.
  • Ollama falls back to CPU: check ollama ps and update drivers.
  • Slow inference: close background apps and use the QAT model.

Originally published on https://everylocalai.com/stack/gemma-4-qat-10gb-laptop

Top comments (0)