AIVisionsLab

Posted on May 24 • Edited on Jun 12

Running FLUX.1 Schnell on an RX 580 8GB — GPU/CPU hybrid architecture

#flux #ai #tutorial #stablediffusion

Image above: generated by FLUX.1 Schnell running on the hybrid architecture described in this post.

The problem

FLUX.1 Schnell is a 12B parameter model. Full precision needs more VRAM than the RX 580 has.

The solution: split the components between GPU and CPU RAM.

Memory map

Component	File	Where	Size
Diffusion model	flux1-schnell-q4_k.gguf	GPU VRAM	~6.5GB
VAE	ae.safetensors	CPU RAM	~160MB
CLIP L	clip_l.safetensors	GPU VRAM	~235MB
T5XXL	t5xxl_fp16.safetensors	CPU RAM	~9.3GB

Total VRAM used: ~6.7GB / 8GB available
Total RAM used: ~9.5GB

The T5XXL encoder dominates RAM usage. If you're tight on RAM, t5xxl_fp8.safetensors reduces it to ~5GB.

⚠️ Critical: use leejet GGUF, not city96

Two different GGUF formats exist for FLUX. They have similar names but are NOT interchangeable:

Source	For
city96 on HuggingFace	ComfyUI + ComfyUI-GGUF node
leejet on HuggingFace	stable-diffusion.cpp ✅

Using city96 GGUF with sd-server returns:

[ERROR] stable-diffusion.cpp:355 - get sd version from file failed
[ERROR] main.cpp:92 - new_sd_ctx_t failed

Download from: https://huggingface.co/leejet/FLUX.1-schnell-gguf

The command

sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 ^
  --diffusion-model "E:\models\flux1-schnell-q4_k.gguf" ^
  --vae "E:\models\ae.safetensors" ^
  --clip_l "E:\models\clip_l.safetensors" ^
  --t5xxl "E:\models\t5xxl_fp16.safetensors" ^
  --cfg-scale 1.0 --steps 4 ^
  --clip-on-cpu --vae-on-cpu --vae-tiling

Flag breakdown:

Flag	Why
`--clip-on-cpu`	Frees ~235MB VRAM
`--vae-on-cpu`	Frees ~160MB VRAM
`--vae-tiling`	Prevents OOM at high resolution
`--cfg-scale 1.0`	Required for FLUX — higher values distort
`--steps 4`	Schnell converges in 4 steps by design

Real benchmark

Stage	Time
T5XXL conditioning	11.49s
Sampling (4 steps @ 1024×1024)	~838s (~14 min)
VAE decode (9 tiles)	40.45s
Total	~14 min

Terminal status at generation:

Listening on http://0.0.0.0:7860
VRAM: 7.6/8.0 GB | RAM: ~9.5 GB | Temp: 66°C

Windows Firewall fix

If OpenWebUI can't reach the server even with --listen-ip 0.0.0.0:

# Run as Administrator
New-NetFirewallRule -DisplayName "sd-server AIVisionsLab" `
  -Direction Inbound -Protocol TCP -LocalPort 7860 -Action Allow

Docker runs in an isolated WSL2 network — 127.0.0.1 won't work. Use your machine's actual local IP.

Full documentation

📖 setup-ia-local-rx580-vulkan.web.app
📦 github.com/aivisionslab-studios/rx580-local-ai-guide

DEV Community