Running Flux Schnell (12B) on a legacy AMD RX 580 (8GB) — 100% Local, No Cloud, No ROCm

#ai #productivity #programming #webdev

I've developed a specialized pipeline to run State-Of-The-Art local AI models on legacy hardware — specifically the AMD RX 580 (Polaris architecture, 8GB VRAM) — without cloud APIs, subscriptions, or ROCm overhead on Windows.
The Problem:
Most platform layers introduce abstraction overhead that immediately chokes an 8GB VRAM card, causing instant DeviceMemoryAllocation (OOM) crashes.
The Solution — Hybrid Memory Segmentation:

Backend: native build of stable-diffusion.cpp running directly over the Vulkan API
Text encoders (clip_l + t5xxl_fp16) fully offloaded to host system RAM (~9.3GB)
Quantized diffusion weights (flux1-schnell-q4_k.gguf) pinned inside GPU VRAM (~6.5GB)
Flags: --vae-on-cpu + --vae-tiling for block-decoding, preventing OOM crashes at high resolution

Orchestration:
Custom modular .bat pipeline to clear ghost VRAM processes, spin up a headless ComfyUI instance on CPU (port 3030), and hook the sd-server.exe backend into a lightweight Vanilla JS/HTML5 dashboard hosted locally via Firebase.
It's not a speed demon (~14 min/image), but it runs 100% offline with zero cloud handshakes.
The whole point is democratization — proving you don't need a $2,000 GPU to run top-tier local AI.
Full docs, source code, and .bat scripts: https://setup-ia-local-rx580-vulkan.firebaseapp.com/

DEV Community

Running Flux Schnell (12B) on a legacy AMD RX 580 (8GB) — 100% Local, No Cloud, No ROCm

Top comments (0)