Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

#ai #llm #machinelearning #tutorial

This stack uses Ollama with Gemma 4 QAT to run a 12B model on a 10GB VRAM laptop GPU. The latest Gemma 4 QAT checkpoints reduce memory usage and enable compact local inference.

What you get

Local Gemma 4 12B inference on 10GB VRAM hardware
QAT compression that fits the model into ~6.7 GB VRAM
A laptop-friendly private AI stack for writing, notes, and prompts

Prerequisites

A laptop with at least 10 GB VRAM, such as RX 6700 series
Latest GPU drivers and Vulkan support
Ollama installed locally
Enough disk space for the model cache (~40 GB)

Setup

brew install ollama
ollama pull gemma-4:12b --quantization qat
ollama serve
ollama ps

If ollama ps shows the model and GPU usage, your stack is ready.

Use it

Personal writing with faster local completion
Private research without sending queries to the cloud
Compact local AI demos on 10GB-class laptops

Troubleshooting

Model won’t load: verify Vulkan and free VRAM.
Ollama falls back to CPU: check ollama ps and update drivers.
Slow inference: close background apps and use the QAT model.

Originally published on https://everylocalai.com/stack/gemma-4-qat-10gb-laptop

DEV Community