DEV Community

Jovan Chan
Jovan Chan

Posted on • Originally published at runaihome.com

Regression: large GGUF models work on Ollama 0.24 but fail o Fix 2026

This article was originally published on runaihome.com

Vulkan OOM Regression in Ollama 0.30.x

Ollama 0.30.x introduced changes to Vulkan memory allocation that make the backend more aggressive about reserving VRAM for model tensors. On systems with 4 GB VRAM, this causes immediate out-of-memory errors when loading large quantized models like gemma4:26b-a4b-it-q4_K_M. The regression does not affect Ollama 0.24, which used a more conservative default allocation strategy.

Fix 1: Limit Vulkan Memory Fraction

Set an environment variable before running Ollama to cap Vulkan memory usage at 50% of available VRAM:

Windows (PowerShell):

$env:OLLAMA_VULKAN_MEMORY_FRACTION="0.5"
ollama run gemma4:26b-a4b-it-q4_K_M
Enter fullscreen mode Exit fullscreen mode

Windows (Command Prompt):


cmd
set OLLAMA_VULKAN_MEMORY_FRACTION=0.5
ollama run gemma4
Enter fullscreen mode Exit fullscreen mode

Top comments (0)