Image above: generated by FLUX.1 Schnell running on the hybrid architecture described in this post.
The problem
FLUX.1 Schnell is a 12B parameter model. Full precision needs more VRAM than the RX 580 has.
The solution: split the components between GPU and CPU RAM.
Memory map
| Component | File | Where | Size |
|---|---|---|---|
| Diffusion model | flux1-schnell-q4_k.gguf | GPU VRAM | ~6.5GB |
| VAE | ae.safetensors | CPU RAM | ~160MB |
| CLIP L | clip_l.safetensors | GPU VRAM | ~235MB |
| T5XXL | t5xxl_fp16.safetensors | CPU RAM | ~9.3GB |
Total VRAM used: ~6.7GB / 8GB available
Total RAM used: ~9.5GB
The T5XXL encoder dominates RAM usage. If you're tight on RAM, t5xxl_fp8.safetensors reduces it to ~5GB.
⚠️ Critical: use leejet GGUF, not city96
Two different GGUF formats exist for FLUX. They have similar names but are NOT interchangeable:
| Source | For |
|---|---|
| city96 on HuggingFace | ComfyUI + ComfyUI-GGUF node |
| leejet on HuggingFace | stable-diffusion.cpp ✅ |
Using city96 GGUF with sd-server returns:
[ERROR] stable-diffusion.cpp:355 - get sd version from file failed
[ERROR] main.cpp:92 - new_sd_ctx_t failed
Download from: https://huggingface.co/leejet/FLUX.1-schnell-gguf
The command
sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 ^
--diffusion-model "E:\models\flux1-schnell-q4_k.gguf" ^
--vae "E:\models\ae.safetensors" ^
--clip_l "E:\models\clip_l.safetensors" ^
--t5xxl "E:\models\t5xxl_fp16.safetensors" ^
--cfg-scale 1.0 --steps 4 ^
--clip-on-cpu --vae-on-cpu --vae-tiling
Flag breakdown:
| Flag | Why |
|---|---|
--clip-on-cpu |
Frees ~235MB VRAM |
--vae-on-cpu |
Frees ~160MB VRAM |
--vae-tiling |
Prevents OOM at high resolution |
--cfg-scale 1.0 |
Required for FLUX — higher values distort |
--steps 4 |
Schnell converges in 4 steps by design |
Real benchmark
| Stage | Time |
|---|---|
| T5XXL conditioning | 11.49s |
| Sampling (4 steps @ 1024×1024) | ~838s (~14 min) |
| VAE decode (9 tiles) | 40.45s |
| Total | ~14 min |
Terminal status at generation:
Listening on http://0.0.0.0:7860
VRAM: 7.6/8.0 GB | RAM: ~9.5 GB | Temp: 66°C
Windows Firewall fix
If OpenWebUI can't reach the server even with --listen-ip 0.0.0.0:
# Run as Administrator
New-NetFirewallRule -DisplayName "sd-server AIVisionsLab" `
-Direction Inbound -Protocol TCP -LocalPort 7860 -Action Allow
Docker runs in an isolated WSL2 network — 127.0.0.1 won't work. Use your machine's actual local IP.
Full documentation
📖 setup-ia-local-rx580-vulkan.web.app
📦 github.com/aivisionslab-studios/rx580-local-ai-guide
Top comments (0)