Regression: large GGUF models work on Ollama 0.24 but fail o Fix 2026

#fix #tutorial #ai #troubleshooting

This article was originally published on runaihome.com

Vulkan OOM Regression in Ollama 0.30.x

Ollama 0.30.x introduced changes to Vulkan memory allocation that make the backend more aggressive about reserving VRAM for model tensors. On systems with 4 GB VRAM, this causes immediate out-of-memory errors when loading large quantized models like gemma4:26b-a4b-it-q4_K_M. The regression does not affect Ollama 0.24, which used a more conservative default allocation strategy.

Fix 1: Limit Vulkan Memory Fraction

Set an environment variable before running Ollama to cap Vulkan memory usage at 50% of available VRAM:

Windows (PowerShell):

$env:OLLAMA_VULKAN_MEMORY_FRACTION="0.5"
ollama run gemma4:26b-a4b-it-q4_K_M

Windows (Command Prompt):


cmd
set OLLAMA_VULKAN_MEMORY_FRACTION=0.5
ollama run gemma4

DEV Community

Regression: large GGUF models work on Ollama 0.24 but fail o Fix 2026

Vulkan OOM Regression in Ollama 0.30.x

Fix 1: Limit Vulkan Memory Fraction

Top comments (0)