DEV Community

Cover image for How to Set Up Gemma 4 on Windows, Linux, and Mac
Agbo, Daniel Onuoha
Agbo, Daniel Onuoha

Posted on

How to Set Up Gemma 4 on Windows, Linux, and Mac

Before running any model, you need Ollama (or llama.cpp) installed for your OS. Ollama is the simplest starting point across all three platforms.

Windows

  1. Download the installer from ollama.com/download and run the .exe file
  2. Follow the install wizard — Ollama adds itself to the system tray and starts automatically
  3. Open PowerShell or Command Prompt and verify:
ollama -v
Enter fullscreen mode Exit fullscreen mode
  1. Pull and run a model:
ollama run gemma4:e4b
Enter fullscreen mode Exit fullscreen mode

If you have an NVIDIA GPU, install the latest CUDA drivers first so Ollama detects and uses it automatically — check with nvidia-smi in PowerShell.

Linux

  1. Install via the official script:
curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode
  1. Start the service (it usually starts automatically, but you can also run it manually):
ollama serve
Enter fullscreen mode Exit fullscreen mode
  1. Verify installation and GPU detection:
ollama -v
nvidia-smi   # for NVIDIA GPUs
Enter fullscreen mode Exit fullscreen mode
  1. For a persistent background service, register systemd instead of running ollama serve in a terminal:
sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)
sudo systemctl daemon-reload
sudo systemctl enable --now ollama
Enter fullscreen mode Exit fullscreen mode
  1. Pull and run:
ollama run gemma4:e4b
Enter fullscreen mode Exit fullscreen mode

AMD GPU users need the additional ROCm package (ollama-linux-amd64-rocm.tar.zst) alongside the base install.

macOS

  1. Download the .dmg from ollama.com/download, open it, and drag Ollama into Applications — or install via Homebrew:
brew install ollama
brew services start ollama
Enter fullscreen mode Exit fullscreen mode
  1. Launch Ollama from Applications once to grant permissions; you'll see its icon in the menu bar
  2. Verify and run:
ollama --version
ollama run gemma4:e4b
Enter fullscreen mode Exit fullscreen mode

Apple Silicon Macs (M1–M4) use Metal acceleration automatically — no extra driver setup needed, and unified memory determines the largest model size you can run comfortably.

Choosing llama.cpp Instead

If you need raw inference speed over convenience, build llama.cpp directly on any OS:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON   # use -DGGML_METAL=ON on Mac, omit flag on CPU-only Linux/Windows
cmake --build build --config Release
./build/bin/llama-cli -m gemma4-e4b-Q4_K_M.gguf -p "Explain Gemma 4 quantization"
Enter fullscreen mode Exit fullscreen mode

Download the GGUF weights separately from Hugging Face's Gemma 4 collection before running this command.

Top comments (0)