This article was originally published on runaihome.com
Vulkan OOM Regression in Ollama 0.30.x
Ollama 0.30.x introduced changes to Vulkan memory allocation that make the backend more aggressive about reserving VRAM for model tensors. On systems with 4 GB VRAM, this causes immediate out-of-memory errors when loading large quantized models like gemma4:26b-a4b-it-q4_K_M. The regression does not affect Ollama 0.24, which used a more conservative default allocation strategy.
Fix 1: Limit Vulkan Memory Fraction
Set an environment variable before running Ollama to cap Vulkan memory usage at 50% of available VRAM:
Windows (PowerShell):
$env:OLLAMA_VULKAN_MEMORY_FRACTION="0.5"
ollama run gemma4:26b-a4b-it-q4_K_M
Windows (Command Prompt):
cmd
set OLLAMA_VULKAN_MEMORY_FRACTION=0.5
ollama run gemma4
Top comments (0)