DEV Community

Cover image for Running FLUX.1 Schnell on an RX 580 8GB — GPU/CPU hybrid architecture
AIVisionsLab
AIVisionsLab

Posted on

Running FLUX.1 Schnell on an RX 580 8GB — GPU/CPU hybrid architecture

Image above: generated by FLUX.1 Schnell running on the hybrid architecture described in this post.

The problem

FLUX.1 Schnell is a 12B parameter model. Full precision needs more VRAM than the RX 580 has.

The solution: split the components between GPU and CPU RAM.


Memory map

Component File Where Size
Diffusion model flux1-schnell-q4_k.gguf GPU VRAM ~6.5GB
VAE ae.safetensors CPU RAM ~160MB
CLIP L clip_l.safetensors GPU VRAM ~235MB
T5XXL t5xxl_fp16.safetensors CPU RAM ~9.3GB

Total VRAM used: ~6.7GB / 8GB available
Total RAM used: ~9.5GB

The T5XXL encoder dominates RAM usage. If you're tight on RAM, t5xxl_fp8.safetensors reduces it to ~5GB.


⚠️ Critical: use leejet GGUF, not city96

Two different GGUF formats exist for FLUX. They have similar names but are NOT interchangeable:

Source For
city96 on HuggingFace ComfyUI + ComfyUI-GGUF node
leejet on HuggingFace stable-diffusion.cpp ✅

Using city96 GGUF with sd-server returns:

[ERROR] stable-diffusion.cpp:355 - get sd version from file failed
[ERROR] main.cpp:92 - new_sd_ctx_t failed
Enter fullscreen mode Exit fullscreen mode

Download from: https://huggingface.co/leejet/FLUX.1-schnell-gguf


The command

sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 ^
  --diffusion-model "E:\models\flux1-schnell-q4_k.gguf" ^
  --vae "E:\models\ae.safetensors" ^
  --clip_l "E:\models\clip_l.safetensors" ^
  --t5xxl "E:\models\t5xxl_fp16.safetensors" ^
  --cfg-scale 1.0 --steps 4 ^
  --clip-on-cpu --vae-on-cpu --vae-tiling
Enter fullscreen mode Exit fullscreen mode

Flag breakdown:

Flag Why
--clip-on-cpu Frees ~235MB VRAM
--vae-on-cpu Frees ~160MB VRAM
--vae-tiling Prevents OOM at high resolution
--cfg-scale 1.0 Required for FLUX — higher values distort
--steps 4 Schnell converges in 4 steps by design

Real benchmark

Stage Time
T5XXL conditioning 11.49s
Sampling (4 steps @ 1024×1024) ~838s (~14 min)
VAE decode (9 tiles) 40.45s
Total ~14 min

Terminal status at generation:

Listening on http://0.0.0.0:7860
VRAM: 7.6/8.0 GB | RAM: ~9.5 GB | Temp: 66°C
Enter fullscreen mode Exit fullscreen mode

Windows Firewall fix

If OpenWebUI can't reach the server even with --listen-ip 0.0.0.0:

# Run as Administrator
New-NetFirewallRule -DisplayName "sd-server AIVisionsLab" `
  -Direction Inbound -Protocol TCP -LocalPort 7860 -Action Allow
Enter fullscreen mode Exit fullscreen mode

Docker runs in an isolated WSL2 network — 127.0.0.1 won't work. Use your machine's actual local IP.


Full documentation

📖 setup-ia-local-rx580-vulkan.web.app
📦 github.com/aivisionslab-studios/rx580-local-ai-guide

Top comments (0)