I ran Flux Schnell + LLMs on a $50 GPU. No CUDA. No cloud. No ROCm.

#ai #llm #opensource #tutorial

All images in this article were generated locally on the RX 580 8GB described below.

The narrative was clear

In 2026, every guide says the same thing:

"Your AMD RX 580 can't run AI. Buy a new GPU."

AMD dropped ROCm support for Polaris/GCN4 in v5.x.
DirectML crashed with OpaqueTensorImpl errors.
OpenVINO failed silently.

So we had a 8GB GPU sitting at 0% utilization while the CPU burned through LLM responses at 3 tokens/second.

We refused to buy a new GPU.

The fix: Vulkan

The ggml project — the engine behind llama.cpp and stable-diffusion.cpp — supports Vulkan as a GPU backend. Vulkan is an open standard that still supports the RX 580 natively since its 2017 drivers.

No CUDA. No ROCm. No DirectML. Just Vulkan.

Results (real terminal logs, not benchmarks)

Workload	Model	Speed
LLM inference	Mistral 7B Q4	15–16 tok/s
Image generation	DreamShaper 8 GGUF	~72s/image
FLUX.1 Schnell	flux1-schnell-q4_k (hybrid)	~14 min @ 1024×1024

CPU baseline without GPU: 3–5 tok/s.
Vulkan uplift: 3–4× on a GPU that "doesn't support AI."

Hardware

GPU:     AMD RX 580 2048SP — 8GB GDDR5 (Polaris / GCN4)
CPU:     Intel Xeon E5-2690 v3 — 12c/24t (2014)
RAM:     32GB DDR4 REG ECC
Storage: NVMe 1TB — 1.7–3.5 GB/s
OS:      Windows 10 Pro + WSL2 Ubuntu 22.04

The NVMe alone reduced FLUX model load time from 25 minutes to 30 seconds.
Storage is as critical as the GPU.

Build llama.cpp with Vulkan

# Run in Developer PowerShell for VS
cd E:\
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j20

Validate:

cd build\bin\Release
.\llama-cli.exe --list-devices
# Expected: Vulkan0: AMD Radeon RX 580 2048SP ✅

Build stable-diffusion.cpp with Vulkan

git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp && mkdir build && cd build
cmake .. -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j20

Run the server

E:
cd "E:\stable-diffusion.cpp\build\bin\Release"
sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 ^
  -m "E:\models\dreamshaper_8.safetensors"

Connect OpenWebUI → Admin → Images → Automatic1111 → http://YOUR_LOCAL_IP:7860/

⚠️ Critical: two types of GGUF

If you try to run FLUX and get new_sd_ctx_t failed — you downloaded the wrong GGUF.

Source	Compatible with
city96 (HuggingFace)	ComfyUI only
leejet (HuggingFace)	stable-diffusion.cpp ✅

Always use: https://huggingface.co/leejet/FLUX.1-schnell-gguf

What failed (documented)

Attempt	Error	Why
DirectML	`OpaqueTensorImpl`	MS tensors can't talk to ComfyUI backends
ROCm	Kernel panics	GCN4 dropped in v5.x — permanent
OpenVINO	`No module 'ldm'`	Extension targets old A1111 arch
CPU + HDD	19 min/image	No GPU + mechanical I/O bottleneck