AIVisionsLab

Posted on May 24 • Edited on Jun 12

Everything that failed before Vulkan saved our RX 580 AI setup

#ai #amd #playwright #opensource

All images in this article were generated locally on the RX 580 8GB — after we fixed everything described below.

The graveyard

Before Vulkan worked, we tried everything. This is the technical autopsy.

1. DirectML — Microsoft's promise that crashed

The attempt: torch-directml with --directml flag in ComfyUI.

The GPU was detected as privateuseone0. Looked promising.

Then this appeared on every run:

WARNING: torch-directml barely works, is very slow,
has not been updated in over 1 year and might be
removed soon, please don't use it.

NotImplementedError: Cannot access storage of OpaqueTensorImpl

Root cause: DirectML wraps tensor data in opaque objects called OpaqueTensorImpl. When ComfyUI's modern attention backends try to read the raw memory contents, the Microsoft layer blocks access entirely.

The project hasn't been updated in over a year. It's effectively abandoned.

Manual fix attempt: Downgrade to the May 2024 dev build:

pip uninstall torch torch-directml torchaudio
pip install torch==2.3.1+cpu --index-url https://download.pytorch.org/whl/cpu
pip install torch-directml==0.2.1.dev240521 --no-deps

This stops the crash but the performance is so slow it's unusable.

2. ROCm — officially dead for GCN4

The attempt: AMD's official GPGPU framework.

The reality: AMD dropped official support for Polaris/GCN4 architecture in ROCm v5.x. Permanently. There is no workaround.

On Windows: no native ROCm support at all.
On WSL2 with compatibility layers: kernel panics under heavy inference load.

The only working ROCm path for the RX 580 is via Docker containers that emulate gfx803 — which is what Amihart documented in January 2025. It works for Stable Diffusion, but requires Docker overhead and doesn't support modern FLUX architecture.

3. OpenVINO + Stable Diffusion Forge

The attempt: Intel's sd-webui-openvino extension inside Forge.

ModuleNotFoundError: No module named 'ldm'
ModuleNotFoundError: No module named 'sgm'
Error build_unet: Invalid backend: 'openvino'

Root cause: The extension was designed for the old AUTOMATIC1111 architecture. Forge completely restructured the codebase and replaced the native ldm and sgm modules. The OpenVINO injection fails at the foundation level.

4. CPU + HDD — the baseline disaster

Before any GPU acceleration:

Boot time: 85 seconds
LLM response: 3–5 tok/s
Image generation: ~19 minutes per 512×512 image
FLUX 16GB model load: 25 minutes from HDD

The mechanical drive was as much of a bottleneck as the missing GPU acceleration.

What actually worked

After all of this: Vulkan.

The ggml engine in llama.cpp and stable-diffusion.cpp uses Vulkan as a native GPU backend. The RX 580 has supported Vulkan 1.x since 2017 drivers. No special installation. No compatibility layers. Just compile with -DGGML_VULKAN=ON.

Results after switching:

LLM: 15–16 tok/s (from 3–5)
Image: ~72s (from ~19 min)
FLUX load: 30 seconds (from 25 min, after NVMe migration)

The lesson

The hardware was never the problem. Every failure above was a software problem:

DirectML: abandoned by Microsoft
ROCm: architecture policy decision by AMD
OpenVINO: extension not maintained for modern frontends
HDD: wrong storage choice

The RX 580 was waiting for ggml + Vulkan.

Full documentation

📖 setup-ia-local-rx580-vulkan.web.app
📦 github.com/aivisionslab-studios/rx580-local-ai-guide

DEV Community