<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AIVisionsLab</title>
    <description>The latest articles on DEV Community by AIVisionsLab (@aivisionslab).</description>
    <link>https://dev.to/aivisionslab</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3946564%2F2e4047e5-fedf-4680-9e84-b8d8a1f32be6.png</url>
      <title>DEV Community: AIVisionsLab</title>
      <link>https://dev.to/aivisionslab</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aivisionslab"/>
    <language>en</language>
    <item>
      <title>Running Local AI on an AMD RX 580 in 2026 — The Complete Vulkan Guide</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Thu, 11 Jun 2026 23:35:39 +0000</pubDate>
      <link>https://dev.to/aivisionslab/running-local-ai-on-an-amd-rx-580-in-2026-the-complete-vulkan-guide-52a5</link>
      <guid>https://dev.to/aivisionslab/running-local-ai-on-an-amd-rx-580-in-2026-the-complete-vulkan-guide-52a5</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6yluqzyzu6akde4khwu8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6yluqzyzu6akde4khwu8.png" alt=" " width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running Local AI on an AMD RX 580 in 2026 — The Complete Vulkan Guide&lt;br&gt;
"Your RX 580 can't run AI. Buy a new GPU."&lt;/p&gt;

&lt;p&gt;That was the consensus in 2026. AMD dropped ROCm support for Polaris/GCN4 architecture in v5.x. DirectML crashes with OpaqueTensorImpl. OpenVINO fails silently on Forge. Every mainstream AI stack gave up on this card.&lt;/p&gt;

&lt;p&gt;We didn't.&lt;/p&gt;

&lt;p&gt;This is the complete technical record of how we built a full local AI production stack on an AMD RX 580 8GB — running LLMs at 17 tok/s, generating images in 72 seconds, transcribing audio 150× faster than CPU, and even cloning voices. All offline. All free. All on hardware that cost under $50.&lt;/p&gt;

&lt;p&gt;The Hardware&lt;br&gt;
Component   Spec&lt;br&gt;
GPU AMD RX 580 2048SP 8GB GDDR5 (Polaris / GCN4)&lt;br&gt;
CPU Intel Xeon E5-2690 v3 — 12c/24t · 3.5GHz (2014)&lt;br&gt;
RAM 32GB DDR4 REG ECC Quad Channel&lt;br&gt;
Storage NVMe 1TB — 1.7–3.5 GB/s&lt;br&gt;
OS  Windows 10 Pro + WSL2 / Ubuntu 26.04 LTS&lt;br&gt;
The RX 580 2048SP is the mining-variant with 2048 shader processors instead of the original 2304SP. It's everywhere on the used market for under $50. It performs identically through Vulkan.&lt;/p&gt;

&lt;p&gt;One thing nobody talks about: storage matters as much as the GPU. Moving from HDD to NVMe reduced FLUX.1 model load time from 25 minutes to 30 seconds. The bottleneck was never the GPU.&lt;/p&gt;

&lt;p&gt;Why Vulkan?&lt;br&gt;
The entire mainstream AI stack runs on either CUDA (Nvidia-only) or ROCm (AMD dropped Polaris in v5.x). That leaves legacy AMD GPUs with no official path.&lt;/p&gt;

&lt;p&gt;But there's a third option: Vulkan — a universal graphics/compute API that works on any modern GPU, including the RX 580, which has supported Vulkan 1.x since its 2017 drivers.&lt;/p&gt;

&lt;p&gt;The ggml project (the engine behind llama.cpp and stable-diffusion.cpp) implements Vulkan compute backends in pure C++. This means you can compile directly against the Vulkan API and completely bypass the ROCm/CUDA ecosystem. No driver packages. No compatibility layers. Just the GPU doing math.&lt;/p&gt;

&lt;p&gt;What We Tried Before Vulkan (And Why It All Failed)&lt;br&gt;
Before finding the working path, we hit every dead end:&lt;/p&gt;

&lt;p&gt;DirectML + ComfyUI — The GPU gets detected as privateuseone0, but then:&lt;/p&gt;

&lt;p&gt;NotImplementedError: Cannot access storage of OpaqueTensorImpl&lt;br&gt;
DirectML wraps tensor data in opaque objects that ComfyUI's attention backends literally cannot read. Also: Microsoft hasn't updated it since September 2024. It's abandoned.&lt;/p&gt;

&lt;p&gt;ROCm on Polaris — AMD officially dropped GCN4/Polaris in ROCm v5.x. Compatibility layers via WSL2 generate kernel panics under inference load. There is no Windows support. Dead end by design.&lt;/p&gt;

&lt;p&gt;OpenVINO + Stable Diffusion Forge — Intel's extension was built for the old Automatic1111 architecture. Forge restructured everything. Result:&lt;/p&gt;

&lt;p&gt;ModuleNotFoundError: No module named 'ldm'&lt;br&gt;
ModuleNotFoundError: No module named 'sgm'&lt;br&gt;
Error build_unet: Invalid backend: 'openvino'&lt;br&gt;
CPU-only + HDD — Our baseline before any optimization: 85-second startup, ~19 minutes per 512×512 image. The mechanical HDD competing with memory paging made it completely unusable.&lt;/p&gt;

&lt;p&gt;The pattern: every "AMD-compatible" option either targets newer hardware, is abandoned, or is simply incompatible with modern pipelines. Vulkan is the only path that actually works.&lt;/p&gt;

&lt;p&gt;The Architecture: Dual-Path Stack&lt;br&gt;
The core insight of this project is that not every workload fits in 8GB of VRAM. The solution is intelligent routing between GPU and CPU:&lt;/p&gt;

&lt;p&gt;OpenWebUI  :3000  (Docker)&lt;br&gt;
    │&lt;br&gt;
    ├──► llama-server  :8081  ──►  RX 580 Vulkan  [llama.cpp]&lt;br&gt;
    │         └── Ollama      :11434  ──►  CPU fallback&lt;br&gt;
    │&lt;br&gt;
    └──► sd-server     :7860  ──►  RX 580 Vulkan  [stable-diffusion.cpp]&lt;br&gt;
              ├── SD 1.5 GGUF      ──►  72s / image&lt;br&gt;
              └── FLUX hybrid      ──►  ~14 min / image&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;└──► ComfyUI       :8188  ──►  Xeon CPU WSL2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Path 1 — GPU Vulkan: LLM inference + SD 1.5 image generation. Fast, responsive, daily driver.&lt;/p&gt;

&lt;p&gt;Path 2 — CPU Xeon: FLUX.1 16GB models, AnimateDiff video pipelines. The 32GB ECC RAM acts as "virtual VRAM" for models that don't fit on the card.&lt;/p&gt;

&lt;p&gt;Building llama.cpp with Vulkan&lt;br&gt;
Run in Developer PowerShell for Visual Studio:&lt;/p&gt;

&lt;p&gt;powershell&lt;br&gt;
cd E:\&lt;br&gt;
git clone &lt;a href="https://github.com/ggerganov/llama.cpp" rel="noopener noreferrer"&gt;https://github.com/ggerganov/llama.cpp&lt;/a&gt;&lt;br&gt;
cd llama.cpp&lt;br&gt;
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release&lt;br&gt;
cmake --build build --config Release -j20&lt;br&gt;
Validate GPU detection:&lt;/p&gt;

&lt;p&gt;powershell&lt;br&gt;
cd build\bin\Release&lt;br&gt;
.\llama-cli.exe --list-devices&lt;/p&gt;

&lt;h1&gt;
  
  
  Expected: Vulkan0: AMD Radeon RX 580 2048SP ✅
&lt;/h1&gt;

&lt;p&gt;Start the LLM server:&lt;/p&gt;

&lt;p&gt;powershell&lt;br&gt;
.\llama-server.exe -m "E:\models\Mistral-7B-Q4_K_M.gguf" `&lt;br&gt;
  --host 0.0.0.0 --port 8081 --device Vulkan0&lt;br&gt;
How to verify it's actually using the GPU:&lt;/p&gt;

&lt;p&gt;ggml_vulkan: Found 1 Vulkan device(s)&lt;br&gt;
ggml_vulkan: 0 = AMD Radeon RX 580 2048SP | VRAM: 8192MB&lt;br&gt;
17.77 t/s  ← RX 580 Vulkan ✅&lt;br&gt;
If you see 3–5 t/s with no ggml_vulkan line — it's running on CPU. The --device Vulkan0 flag is mandatory.&lt;/p&gt;

&lt;p&gt;Building stable-diffusion.cpp with Vulkan&lt;br&gt;
powershell&lt;br&gt;
git clone --recursive &lt;a href="https://github.com/leejet/stable-diffusion.cpp" rel="noopener noreferrer"&gt;https://github.com/leejet/stable-diffusion.cpp&lt;/a&gt;&lt;br&gt;
cd stable-diffusion.cpp&lt;br&gt;
mkdir build &amp;amp;&amp;amp; cd build&lt;br&gt;
cmake .. -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release&lt;br&gt;
cmake --build . --config Release -j20&lt;br&gt;
Start the image server:&lt;/p&gt;

&lt;p&gt;powershell&lt;br&gt;
E:&lt;br&gt;
cd "E:\stable-diffusion.cpp\build\bin\Release"&lt;br&gt;
.\sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 `&lt;br&gt;
  -m "E:\models\dreamshaper8.gguf"&lt;br&gt;
FLUX.1 Schnell: Running a 16GB Model on 8GB VRAM&lt;br&gt;
FLUX.1 Schnell is a 12B parameter SOTA model that nominally requires 16GB. Here's how we run it on 8GB:&lt;/p&gt;

&lt;p&gt;The strategy is memory segmentation — put the diffusion model on VRAM, offload everything else to RAM:&lt;/p&gt;

&lt;p&gt;Component   File    Where&lt;br&gt;
Diffusion Model flux1-schnell-q4_k.gguf GPU VRAM (~6.5GB)&lt;br&gt;
VAE ae.safetensors  CPU RAM (~160MB)&lt;br&gt;
CLIP L  clip_l.safetensors  GPU VRAM (~235MB)&lt;br&gt;
T5XXL   t5xxl_fp16.safetensors  CPU RAM (~9.3GB)&lt;br&gt;
batch&lt;br&gt;
sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 ^&lt;br&gt;
  --diffusion-model "E:\models\flux1-schnell-q4_k.gguf" ^&lt;br&gt;
  --vae "E:\models\ae.safetensors" ^&lt;br&gt;
  --clip_l "E:\models\clip_l.safetensors" ^&lt;br&gt;
  --t5xxl "E:\models\t5xxl_fp16.safetensors" ^&lt;br&gt;
  --cfg-scale 1.0 --steps 4 --clip-on-cpu --vae-on-cpu --vae-tiling&lt;br&gt;
⚠️ --vae-tiling is not optional. Without it, VAE decode causes OOM and crashes the server.&lt;/p&gt;

&lt;p&gt;Timing per 1024×1024 image:&lt;/p&gt;

&lt;p&gt;Stage   Time&lt;br&gt;
T5XXL conditioning  11.49s&lt;br&gt;
Sampling (4 steps)  ~838s&lt;br&gt;
VAE decode (9 tiles)    40.45s&lt;br&gt;
Total   ~14 min&lt;br&gt;
Critical: Two GGUF formats for FLUX&lt;/p&gt;

&lt;p&gt;This trips up almost everyone. There are two different GGUF distributions for FLUX:&lt;/p&gt;

&lt;p&gt;Source  Compatible with&lt;br&gt;
city96 (HuggingFace)    ComfyUI + ComfyUI-GGUF node only&lt;br&gt;
leejet (HuggingFace)    stable-diffusion.cpp ✅&lt;br&gt;
Using a city96 GGUF in sd-server returns:&lt;/p&gt;

&lt;p&gt;[ERROR] main.cpp:92 - new_sd_ctx_t failed&lt;br&gt;
Always download from: huggingface.co/leejet/FLUX.1-schnell-gguf&lt;/p&gt;

&lt;p&gt;whisper.cpp: Audio Transcription on the RX 580&lt;br&gt;
This is where the numbers get absurd.&lt;/p&gt;

&lt;p&gt;Build whisper.cpp with Vulkan:&lt;/p&gt;

&lt;p&gt;powershell&lt;br&gt;
git clone &lt;a href="https://github.com/ggml-org/whisper.cpp" rel="noopener noreferrer"&gt;https://github.com/ggml-org/whisper.cpp&lt;/a&gt;&lt;br&gt;
cd whisper.cpp&lt;br&gt;
cmake -B build -DGGML_VULKAN=ON -DGGML_HIPBLAS=OFF -DGGML_HIP=OFF -DGGML_CUDA=OFF&lt;br&gt;
cmake --build build --config Release -j4&lt;br&gt;
Transcribe a video (MP4 → TXT):&lt;/p&gt;

&lt;p&gt;powershell&lt;/p&gt;

&lt;h1&gt;
  
  
  Extract audio first (Whisper requires WAV on Windows)
&lt;/h1&gt;

&lt;p&gt;ffmpeg -i "video.mp4" -ar 16000 -ac 1 -c:a pcm_s16le "audio.wav"&lt;/p&gt;

&lt;h1&gt;
  
  
  Transcribe
&lt;/h1&gt;

&lt;p&gt;.\build\bin\Release\whisper-cli.exe &lt;code&gt;&lt;br&gt;
  -m models\ggml-large-v3-turbo.bin&lt;/code&gt;&lt;br&gt;
  -f "audio.wav" -l pt --output-txt&lt;br&gt;
Performance on a 15-minute video (Windows):&lt;/p&gt;

&lt;p&gt;Stage   Time&lt;br&gt;
Model load  4s&lt;br&gt;
Mel spectrogram 1.2s&lt;br&gt;
GPU encode  73s&lt;br&gt;
Decode + batch  168s&lt;br&gt;
Total   307s&lt;br&gt;
VRAM used: only 2.6GB of 8GB. CPU stays at ~5%.&lt;/p&gt;

&lt;p&gt;On Linux (Ubuntu 26.04, Mesa RADV), same hardware, same model:&lt;/p&gt;

&lt;p&gt;Metric  Windows Linux&lt;br&gt;
Time (106s audio)   307s    23.58s&lt;br&gt;
VRAM used   2.6GB   1.6GB&lt;br&gt;
A 13× speedup on the same GPU. Mesa RADV's Vulkan compute path is dramatically more efficient for this workload than the Windows AMD driver.&lt;/p&gt;

&lt;p&gt;Windows vs Linux: Full Benchmark Comparison&lt;br&gt;
Workload    Windows 10  Ubuntu 26.04 (Mesa RADV)    Winner&lt;br&gt;
LLM Qwen3 4B @ 99 layers    ~15–17 tok/s  ~35 tok/s   🏆 Linux (2×)&lt;br&gt;
LLM Qwen3.6 35B @ max layers    7.62 tok/s (max 10 ngl) 5.18 tok/s (max 20 ngl) ⚖️ Tie&lt;br&gt;
SD 1.5 DreamShaper (50 steps)   ~72s    ~85s    🏆 Windows&lt;br&gt;
FLUX Schnell (4 steps, 512×512)    ~84s    ~52s    🏆 Linux&lt;br&gt;
Whisper large-v3-turbo  307s · 2.6GB   23.58s · 1.6GB 🏆 Linux&lt;br&gt;
Why Linux is faster for LLM: Mesa RADV allows up to 20 GPU layers for large models where Windows AMD drivers cap at 10. RADV's memory management is simply more aggressive and efficient.&lt;/p&gt;

&lt;p&gt;Why Windows wins SD 1.5: The proprietary AMD driver has more stable direct rendering for this specific workload. Consistent 1.44s/it vs 1.65s/it on Linux.&lt;/p&gt;

&lt;p&gt;Voice Cloning: Applio RVC on AMD Windows&lt;br&gt;
We also built a full voice cloning pipeline:&lt;/p&gt;

&lt;p&gt;Text → Balabolka (TTS) → WAV → Applio RVC → Cloned Voice&lt;br&gt;
The key insight: instead of using a generative TTS model (which sounds robotic), we use a real voice actor (Antônio Neural, a Microsoft Neural voice) for prosody and emotion, then apply RVC to convert the identity to our target voice (Yuri). Result: 80–95% naturalness vs 60–70% for pure TTS.&lt;/p&gt;

&lt;p&gt;AMD-specific critical findings:&lt;/p&gt;

&lt;p&gt;DirectML is effectively dead for RVC — torch-directml is locked to torch==2.4.1 while Applio requires torch==2.7.1. Irreconcilable conflict.&lt;/p&gt;

&lt;p&gt;Use CPU mode. On Xeon E5-2690 v3 (24 threads): ~6 min/epoch, ~20 hours for 200 epochs. Inference after training: 2 hours of audio → ~30 minutes processing.&lt;/p&gt;

&lt;p&gt;The silent failure trap:&lt;/p&gt;

&lt;p&gt;powershell&lt;/p&gt;

&lt;h1&gt;
  
  
  NEVER set these — they silently break feature extraction
&lt;/h1&gt;

&lt;h1&gt;
  
  
  set CUDA_VISIBLE_DEVICES=-1
&lt;/h1&gt;

&lt;h1&gt;
  
  
  set ROCM_VISIBLE_DEVICES=-1
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Training will print "Model trained successfully" but produce nothing
&lt;/h1&gt;

&lt;p&gt;Always verify logs/project/extracted/ contains .npy files before starting training.&lt;/p&gt;

&lt;p&gt;The Community Timeline&lt;br&gt;
This project didn't happen in isolation. Three independent researchers, same GPU, same conclusion:&lt;/p&gt;

&lt;p&gt;Date    Author  Contribution&lt;br&gt;
Jan 2025    艾米心 Amihart   First LLM via Vulkan on RX 580 — 24.56 tok/s on Debian&lt;br&gt;
Dec 2025    DH / DadHacks   First SD via Vulkan — stable-diffusion.cpp breakthrough&lt;br&gt;
2026    AIVisionsLab    Full Windows + Linux production stack, voice cloning, transcription&lt;br&gt;
The shared foundation: ggml by Georgi Gerganov. Vulkan compute backends in pure C++ that bypass the entire proprietary driver ecosystem.&lt;/p&gt;

&lt;p&gt;Real Benchmarks Summary&lt;br&gt;
Workload    Model   Backend Result&lt;br&gt;
LLM inference   Mistral 7B Q4_K_M   RX 580 Vulkan (Win) 17–18 tok/s&lt;br&gt;
LLM inference   Qwen3 4B Q4_K_M RX 580 Vulkan (Linux)   ~35 tok/s&lt;br&gt;
LLM baseline    Mistral 7B Q4_K_M   Xeon CPU pure   3–5 tok/s&lt;br&gt;
Image gen   DreamShaper 8 SD1.5 RX 580 Vulkan   ~72s / 512×512&lt;br&gt;
Image gen   flux1-schnell-q4_k  GPU+CPU hybrid  ~14 min @ 1024×1024&lt;br&gt;
Audio transcription Whisper large-v3-turbo  RX 580 Vulkan (Linux)   23.58s / 106s audio&lt;br&gt;
Video frames    AnimateDiff Xeon WSL2 CPU   ~141s/frame&lt;br&gt;
Voice inference Applio RVC  Xeon CPU    ~30 min / 2h audio&lt;br&gt;
Troubleshooting: The Most Common Failures&lt;br&gt;
generate_image returned no results / frozen terminal Bug in sd-server with Seed: -1. Fix: set a fixed integer seed (42, 1337) in OpenWebUI.&lt;/p&gt;

&lt;p&gt;new_sd_ctx_t failed with FLUX You're using a city96 GGUF. Download from leejet instead.&lt;/p&gt;

&lt;p&gt;Docker can't reach sd-server Windows Defender blocks the Docker subnet (172.x.x.x). Run as Administrator:&lt;/p&gt;

&lt;p&gt;powershell&lt;br&gt;
New-NetFirewallRule -DisplayName "sd-server AIVisionsLab" `&lt;br&gt;
  -Direction Inbound -Protocol TCP -LocalPort 7860 -Action Allow&lt;br&gt;
--override-tensor exps=CPU slows down Vulkan This flag is optimized for CUDA/PCIe on Nvidia. Under Vulkan, the CPU↔GPU transfer overhead destroys any gains. Don't apply CUDA-optimized flags to Vulkan backends.&lt;/p&gt;

&lt;p&gt;Full Documentation&lt;br&gt;
This post covers the core architecture. Full guides for each component:&lt;/p&gt;

&lt;p&gt;📖 Master documentation (PT/EN): setup-ia-local-rx580-vulkan.web.app&lt;br&gt;
💻 GitHub repository: github.com/aivisionslab-studios/rx580-local-ai-guide&lt;br&gt;
🎥 YouTube: @aivisionslab-hub&lt;br&gt;
Conclusion&lt;br&gt;
The narrative that legacy AMD GPUs can't run AI is a software problem, not a hardware limitation. The RX 580 has supported Vulkan since 2017. The compute capability was always there.&lt;/p&gt;

&lt;p&gt;What changed is that ggml and its ecosystem built Vulkan backends that bypass the entire proprietary driver stack. The result is a GPU from 2017 running SOTA models from 2026 — locally, privately, for free.&lt;/p&gt;

&lt;p&gt;RX 580 (2017) + Xeon (2014) + Vulkan + ggml = SOTA AI in 2026&lt;br&gt;
The problem was never the GPU.&lt;/p&gt;

&lt;p&gt;AIVisionsLab — Documenting local AI on legacy hardware. São Paulo, Brazil 🇧🇷&lt;/p&gt;

&lt;p&gt;Description&lt;br&gt;
How we built a full local AI stack (LLMs, Stable Diffusion, FLUX, Whisper, Voice Cloning) on a 2017 GPU using Vulkan. No CUDA. No ROCm. No cloud. Real benchmarks, real failures, real solutions.&lt;/p&gt;

&lt;p&gt;Title&lt;br&gt;
Running Local AI on an AMD RX 580 in 2026 — The Complete Vulkan Guide&lt;/p&gt;

&lt;p&gt;Published&lt;br&gt;
true&lt;/p&gt;

&lt;p&gt;Tags&lt;br&gt;
ai, amd, vulkan, opensource&lt;/p&gt;

&lt;p&gt;Cover_image&lt;br&gt;
&lt;a href="https://setup-ia-local-rx580-vulkan.web.app/og-image.png" rel="noopener noreferrer"&gt;https://setup-ia-local-rx580-vulkan.web.app/og-image.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Canonical_url&lt;br&gt;
&lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;https://setup-ia-local-rx580-vulkan.web.app&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>amd</category>
      <category>vulkan</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Three researchers. One GPU. Two years. How the RX 580 became an AI platform.</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Sun, 24 May 2026 13:20:37 +0000</pubDate>
      <link>https://dev.to/aivisionslab/three-researchers-one-gpu-two-years-how-the-rx-580-became-an-ai-platform-5989</link>
      <guid>https://dev.to/aivisionslab/three-researchers-one-gpu-two-years-how-the-rx-580-became-an-ai-platform-5989</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;All images in this article were generated on the RX 580 8GB — the same GPU everyone said couldn't run AI.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  This is collective knowledge
&lt;/h2&gt;

&lt;p&gt;Three independent researchers. No coordination. Same GPU. Same conclusion.&lt;/p&gt;




&lt;h2&gt;
  
  
  January 2025 — 艾米心 Amihart
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Debian Linux&lt;br&gt;
&lt;strong&gt;Published:&lt;/strong&gt; &lt;a href="https://medium.com/@amihart" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amihart was the first to document LLM inference via Vulkan on the RX 580.&lt;/p&gt;

&lt;p&gt;Compiled &lt;code&gt;llama.cpp&lt;/code&gt; with &lt;code&gt;-DGGML_VULKAN=on&lt;/code&gt; on Debian, connected a Celeron G6900 CPU setup, and measured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU only:&lt;/strong&gt; 5.45 tok/s&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RX 580 via Vulkan:&lt;/strong&gt; 24.56 tok/s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A 4.5× uplift on hardware that officially "doesn't support AI."&lt;/p&gt;

&lt;p&gt;But then came this line — honest, and correct for the time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Sadly, even though Vulkan seems to do a pretty good job with the RX580, I am unaware of any way to get Vulkan to work with Stable Diffusion. If you want to use Stable Diffusion, you will need ROCm."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sentence opened a question that the next researcher answered.&lt;/p&gt;


&lt;h2&gt;
  
  
  December 2025 — DH / DadHacks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Linux/Debian&lt;br&gt;
&lt;strong&gt;Published:&lt;/strong&gt; &lt;a href="https://dadhacks.org/2025/12/05/ai-image-generation-on-rx-580-using-vulkan-a-cost-effective-solution/" rel="noopener noreferrer"&gt;dadhacks.org&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;DadHacks refuted Amihart's limitation — not as a criticism, but as proof that the software evolved.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;stable-diffusion.cpp&lt;/code&gt; had matured. With &lt;code&gt;-DSD_VULKAN=ON&lt;/code&gt; (equivalent to &lt;code&gt;-DGGML_VULKAN=ON&lt;/code&gt; in newer versions), image generation via Vulkan on the RX 580 worked.&lt;/p&gt;

&lt;p&gt;Including FLUX.1 Schnell in Q4 quantization, with CPU offloading for components that exceeded VRAM.&lt;/p&gt;

&lt;p&gt;The barrier Amihart correctly identified in January had fallen by December.&lt;/p&gt;


&lt;h2&gt;
  
  
  2026 — AIVisionsLab
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Windows 10 Pro + WSL2&lt;br&gt;
&lt;strong&gt;Published:&lt;/strong&gt; &lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;setup-ia-local-rx580-vulkan.web.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The third step was integration.&lt;/p&gt;

&lt;p&gt;Both previous projects ran on Linux. Neither connected everything into a unified daily-use system on Windows. Neither documented the failures (DirectML, ROCm, OpenVINO). Neither built automation scripts. Neither integrated OpenWebUI.&lt;/p&gt;

&lt;p&gt;AIVisionsLab filled those gaps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full Windows stack with &lt;code&gt;.bat&lt;/code&gt; automation&lt;/li&gt;
&lt;li&gt;OpenWebUI integration via Docker with firewall notes&lt;/li&gt;
&lt;li&gt;Dual architecture: GPU Vulkan for fast models, Xeon CPU WSL2 for FLUX 16GB&lt;/li&gt;
&lt;li&gt;Documented every failure with root cause analysis&lt;/li&gt;
&lt;li&gt;Discovered the critical GGUF incompatibility: city96 vs leejet formats&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  The question each project answered
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Answer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Amihart&lt;/td&gt;
&lt;td&gt;Can LLMs run on Vulkan RX 580?&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes.&lt;/strong&gt; 24.56 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DadHacks&lt;/td&gt;
&lt;td&gt;Can Stable Diffusion run on Vulkan RX 580?&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes.&lt;/strong&gt; sd.cpp works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIVisionsLab&lt;/td&gt;
&lt;td&gt;Can all this run integrated on Windows daily?&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Yes.&lt;/strong&gt; Full stack documented&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  The common denominator
&lt;/h2&gt;

&lt;p&gt;All three converge on the same engine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ggml (Georgi Gerganov)
  ├── llama.cpp    → LLMs via Vulkan
  └── stable-diffusion.cpp (leejet) → Images via Vulkan
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ggml&lt;/code&gt; ported deep learning tensor operations to C and exposed Vulkan hooks. That single decision freed legacy AMD hardware from the CUDA/ROCm dependency trap.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three philosophies, same conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Amihart:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Despite how ancient this card is, it is technically possible to use it for AI."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;DadHacks:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"This setup provides an accessible pathway for leveraging existing hardware investments without requiring expensive upgrades or specialized software stacks like ROCm."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;AIVisionsLab:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Commercial planned obsolescence is a market choice, not an engineering barrier. Legacy hardware doesn't die — it's liberated by the right software."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Full documentation
&lt;/h2&gt;

&lt;p&gt;📖 &lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;setup-ia-local-rx580-vulkan.web.app&lt;/a&gt; — complete guide in PT/EN/ES/FR/AR&lt;br&gt;
📦 &lt;a href="https://github.com/aivisionslab-studios/rx580-local-ai-guide" rel="noopener noreferrer"&gt;github.com/aivisionslab-studios/rx580-local-ai-guide&lt;/a&gt;&lt;br&gt;
🤗 &lt;a href="https://huggingface.co/aivisionslab/ai-local-rx580-stack" rel="noopener noreferrer"&gt;huggingface.co/aivisionslab/ai-local-rx580-stack&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>amd</category>
      <category>hystory</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Running FLUX.1 Schnell on an RX 580 8GB — GPU/CPU hybrid architecture</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Sun, 24 May 2026 13:18:29 +0000</pubDate>
      <link>https://dev.to/aivisionslab/running-flux1-schnell-on-an-rx-580-8gb-gpucpu-hybrid-architecture-ipb</link>
      <guid>https://dev.to/aivisionslab/running-flux1-schnell-on-an-rx-580-8gb-gpucpu-hybrid-architecture-ipb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Image above: generated by FLUX.1 Schnell running on the hybrid architecture described in this post.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;FLUX.1 Schnell is a 12B parameter model. Full precision needs more VRAM than the RX 580 has.&lt;/p&gt;

&lt;p&gt;The solution: &lt;strong&gt;split the components between GPU and CPU RAM&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory map
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Diffusion model&lt;/td&gt;
&lt;td&gt;flux1-schnell-q4_k.gguf&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;GPU VRAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~6.5GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VAE&lt;/td&gt;
&lt;td&gt;ae.safetensors&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CPU RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~160MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLIP L&lt;/td&gt;
&lt;td&gt;clip_l.safetensors&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;GPU VRAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~235MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T5XXL&lt;/td&gt;
&lt;td&gt;t5xxl_fp16.safetensors&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;CPU RAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~9.3GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total VRAM used:&lt;/strong&gt; ~6.7GB / 8GB available&lt;br&gt;
&lt;strong&gt;Total RAM used:&lt;/strong&gt; ~9.5GB&lt;/p&gt;

&lt;p&gt;The T5XXL encoder dominates RAM usage. If you're tight on RAM, &lt;code&gt;t5xxl_fp8.safetensors&lt;/code&gt; reduces it to ~5GB.&lt;/p&gt;


&lt;h2&gt;
  
  
  ⚠️ Critical: use leejet GGUF, not city96
&lt;/h2&gt;

&lt;p&gt;Two different GGUF formats exist for FLUX. They have similar names but are NOT interchangeable:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;city96 on HuggingFace&lt;/td&gt;
&lt;td&gt;ComfyUI + ComfyUI-GGUF node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;leejet on HuggingFace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;stable-diffusion.cpp ✅&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Using city96 GGUF with sd-server returns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ERROR] stable-diffusion.cpp:355 - get sd version from file failed
[ERROR] main.cpp:92 - new_sd_ctx_t failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Download from: &lt;code&gt;https://huggingface.co/leejet/FLUX.1-schnell-gguf&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The command
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;sd&lt;/span&gt;&lt;span class="na"&gt;-server&lt;/span&gt;.exe &lt;span class="na"&gt;--listen-ip &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.0.0.0 &lt;span class="na"&gt;--listen-port &lt;/span&gt;&lt;span class="m"&gt;7860&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;--diffusion-model &lt;/span&gt;&lt;span class="s2"&gt;"E:\models\flux1-schnell-q4_k.gguf"&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;--vae &lt;/span&gt;&lt;span class="s2"&gt;"E:\models\ae.safetensors"&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;--clip&lt;/span&gt;_l &lt;span class="s2"&gt;"E:\models\clip_l.safetensors"&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;--t&lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="kd"&gt;xxl&lt;/span&gt; &lt;span class="s2"&gt;"E:\models\t5xxl_fp16.safetensors"&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;--cfg-scale &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;.0 &lt;span class="na"&gt;--steps &lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;--clip-on-cpu --vae-on-cpu --vae-tiling
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Flag breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Flag&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--clip-on-cpu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Frees ~235MB VRAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--vae-on-cpu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Frees ~160MB VRAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--vae-tiling&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Prevents OOM at high resolution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--cfg-scale 1.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Required for FLUX — higher values distort&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--steps 4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Schnell converges in 4 steps by design&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Real benchmark
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;T5XXL conditioning&lt;/td&gt;
&lt;td&gt;11.49s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sampling (4 steps @ 1024×1024)&lt;/td&gt;
&lt;td&gt;~838s (~14 min)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VAE decode (9 tiles)&lt;/td&gt;
&lt;td&gt;40.45s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~14 min&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Terminal status at generation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Listening on http://0.0.0.0:7860
VRAM: 7.6/8.0 GB | RAM: ~9.5 GB | Temp: 66°C
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Windows Firewall fix
&lt;/h2&gt;

&lt;p&gt;If OpenWebUI can't reach the server even with &lt;code&gt;--listen-ip 0.0.0.0&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run as Administrator&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;New-NetFirewallRule&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DisplayName&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sd-server AIVisionsLab"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Direction&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Inbound&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Protocol&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;TCP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-LocalPort&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;7860&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Allow&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker runs in an isolated WSL2 network — &lt;code&gt;127.0.0.1&lt;/code&gt; won't work. Use your machine's actual local IP.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full documentation
&lt;/h2&gt;

&lt;p&gt;📖 &lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;setup-ia-local-rx580-vulkan.web.app&lt;/a&gt;&lt;br&gt;
📦 &lt;a href="https://github.com/aivisionslab-studios/rx580-local-ai-guide" rel="noopener noreferrer"&gt;github.com/aivisionslab-studios/rx580-local-ai-guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>flux</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>stablediffusion</category>
    </item>
    <item>
      <title>Everything that failed before Vulkan saved our RX 580 AI setup</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Sun, 24 May 2026 13:14:16 +0000</pubDate>
      <link>https://dev.to/aivisionslab/everything-that-failed-before-vulkan-saved-our-rx-580-ai-setup-4apj</link>
      <guid>https://dev.to/aivisionslab/everything-that-failed-before-vulkan-saved-our-rx-580-ai-setup-4apj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;All images in this article were generated locally on the RX 580 8GB — after we fixed everything described below.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The graveyard
&lt;/h2&gt;

&lt;p&gt;Before Vulkan worked, we tried everything. This is the technical autopsy.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. DirectML — Microsoft's promise that crashed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The attempt:&lt;/strong&gt; torch-directml with &lt;code&gt;--directml&lt;/code&gt; flag in ComfyUI.&lt;/p&gt;

&lt;p&gt;The GPU was detected as &lt;code&gt;privateuseone0&lt;/code&gt;. Looked promising.&lt;/p&gt;

&lt;p&gt;Then this appeared on every run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WARNING: torch-directml barely works, is very slow,
has not been updated in over 1 year and might be
removed soon, please don't use it.

NotImplementedError: Cannot access storage of OpaqueTensorImpl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; DirectML wraps tensor data in opaque objects called &lt;code&gt;OpaqueTensorImpl&lt;/code&gt;. When ComfyUI's modern attention backends try to read the raw memory contents, the Microsoft layer blocks access entirely.&lt;/p&gt;

&lt;p&gt;The project hasn't been updated in over a year. It's effectively abandoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual fix attempt:&lt;/strong&gt; Downgrade to the May 2024 dev build:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip uninstall torch torch-directml torchaudio
pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.3.1+cpu &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cpu
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch-directml&lt;span class="o"&gt;==&lt;/span&gt;0.2.1.dev240521 &lt;span class="nt"&gt;--no-deps&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This stops the crash but the performance is so slow it's unusable.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. ROCm — officially dead for GCN4
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The attempt:&lt;/strong&gt; AMD's official GPGPU framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The reality:&lt;/strong&gt; AMD dropped official support for Polaris/GCN4 architecture in ROCm v5.x. Permanently. There is no workaround.&lt;/p&gt;

&lt;p&gt;On Windows: no native ROCm support at all.&lt;br&gt;
On WSL2 with compatibility layers: kernel panics under heavy inference load.&lt;/p&gt;

&lt;p&gt;The only working ROCm path for the RX 580 is via Docker containers that emulate &lt;code&gt;gfx803&lt;/code&gt; — which is what &lt;a href="https://medium.com/@amihart" rel="noopener noreferrer"&gt;Amihart documented in January 2025&lt;/a&gt;. It works for Stable Diffusion, but requires Docker overhead and doesn't support modern FLUX architecture.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. OpenVINO + Stable Diffusion Forge
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The attempt:&lt;/strong&gt; Intel's &lt;code&gt;sd-webui-openvino&lt;/code&gt; extension inside Forge.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ModuleNotFoundError: No module named 'ldm'
ModuleNotFoundError: No module named 'sgm'
Error build_unet: Invalid backend: 'openvino'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The extension was designed for the old AUTOMATIC1111 architecture. Forge completely restructured the codebase and replaced the native &lt;code&gt;ldm&lt;/code&gt; and &lt;code&gt;sgm&lt;/code&gt; modules. The OpenVINO injection fails at the foundation level.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. CPU + HDD — the baseline disaster
&lt;/h2&gt;

&lt;p&gt;Before any GPU acceleration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boot time: &lt;strong&gt;85 seconds&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;LLM response: &lt;strong&gt;3–5 tok/s&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Image generation: &lt;strong&gt;~19 minutes per 512×512 image&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;FLUX 16GB model load: &lt;strong&gt;25 minutes from HDD&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mechanical drive was as much of a bottleneck as the missing GPU acceleration.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually worked
&lt;/h2&gt;

&lt;p&gt;After all of this: &lt;strong&gt;Vulkan&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ggml&lt;/code&gt; engine in &lt;code&gt;llama.cpp&lt;/code&gt; and &lt;code&gt;stable-diffusion.cpp&lt;/code&gt; uses Vulkan as a native GPU backend. The RX 580 has supported Vulkan 1.x since 2017 drivers. No special installation. No compatibility layers. Just compile with &lt;code&gt;-DGGML_VULKAN=ON&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Results after switching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LLM: &lt;strong&gt;15–16 tok/s&lt;/strong&gt; (from 3–5)&lt;/li&gt;
&lt;li&gt;Image: &lt;strong&gt;~72s&lt;/strong&gt; (from ~19 min)&lt;/li&gt;
&lt;li&gt;FLUX load: &lt;strong&gt;30 seconds&lt;/strong&gt; (from 25 min, after NVMe migration)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The lesson
&lt;/h2&gt;

&lt;p&gt;The hardware was never the problem. Every failure above was a software problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DirectML: abandoned by Microsoft&lt;/li&gt;
&lt;li&gt;ROCm: architecture policy decision by AMD&lt;/li&gt;
&lt;li&gt;OpenVINO: extension not maintained for modern frontends&lt;/li&gt;
&lt;li&gt;HDD: wrong storage choice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The RX 580 was waiting for &lt;code&gt;ggml&lt;/code&gt; + Vulkan.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Full documentation
&lt;/h2&gt;

&lt;p&gt;📖 &lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;setup-ia-local-rx580-vulkan.web.app&lt;/a&gt;&lt;br&gt;
📦 &lt;a href="https://github.com/aivisionslab-studios/rx580-local-ai-guide" rel="noopener noreferrer"&gt;github.com/aivisionslab-studios/rx580-local-ai-guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>amd</category>
      <category>playwright</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Rodei Flux Schnell + LLM numa GPU de R$300. Sem CUDA. Sem cloud. Sem ROCm.</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Sun, 24 May 2026 13:08:33 +0000</pubDate>
      <link>https://dev.to/aivisionslab/rodei-flux-schnell-llm-numa-gpu-de-r300-sem-cuda-sem-cloud-sem-rocm-575e</link>
      <guid>https://dev.to/aivisionslab/rodei-flux-schnell-llm-numa-gpu-de-r300-sem-cuda-sem-cloud-sem-rocm-575e</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Todas as imagens deste artigo foram geradas localmente na RX 580 8GB descrita abaixo.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  A narrativa era clara
&lt;/h2&gt;

&lt;p&gt;Em 2026, todo guia diz a mesma coisa:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Sua AMD RX 580 não roda IA. Compra uma GPU nova."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A AMD removeu suporte ROCm para Polaris/GCN4 na v5.x.&lt;br&gt;
DirectML travava com erros de &lt;code&gt;OpaqueTensorImpl&lt;/code&gt;.&lt;br&gt;
OpenVINO falhava silenciosamente.&lt;/p&gt;

&lt;p&gt;GPU de 8GB parada em 0% de uso enquanto o CPU respondia LLMs a 3 tokens por segundo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A gente recusou comprar uma GPU nova.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  A solução: Vulkan
&lt;/h2&gt;

&lt;p&gt;O projeto &lt;code&gt;ggml&lt;/code&gt; — engine base do &lt;code&gt;llama.cpp&lt;/code&gt; e &lt;code&gt;stable-diffusion.cpp&lt;/code&gt; — suporta Vulkan como backend de GPU. Vulkan é um padrão aberto que ainda suporta a RX 580 nativamente desde os drivers de 2017.&lt;/p&gt;

&lt;p&gt;Sem CUDA. Sem ROCm. Sem DirectML. Só Vulkan.&lt;/p&gt;


&lt;h2&gt;
  
  
  Resultados reais (logs do terminal, não benchmarks sintéticos)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Modelo&lt;/th&gt;
&lt;th&gt;Velocidade&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Mistral 7B Q4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;15–16 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Geração de imagem&lt;/td&gt;
&lt;td&gt;DreamShaper 8 GGUF&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~72s/imagem&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1 Schnell&lt;/td&gt;
&lt;td&gt;flux1-schnell-q4_k híbrido&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~14 min @ 1024×1024&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CPU sem GPU: &lt;strong&gt;3–5 tok/s&lt;/strong&gt;.&lt;br&gt;
Ganho com Vulkan: &lt;strong&gt;3–4×&lt;/strong&gt; numa GPU que "não suporta IA".&lt;/p&gt;


&lt;h2&gt;
  
  
  Hardware
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPU:     AMD RX 580 2048SP — 8GB GDDR5 (Polaris / GCN4)
CPU:     Intel Xeon E5-2690 v3 — 12c/24t (2014)
RAM:     32GB DDR4 REG ECC
Storage: NVMe 1TB — 1.7–3.5 GB/s
OS:      Windows 10 Pro + WSL2 Ubuntu 22.04
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;O NVMe sozinho reduziu o carregamento do FLUX de &lt;strong&gt;25 minutos para 30 segundos&lt;/strong&gt;.&lt;br&gt;
Storage é tão crítico quanto a GPU.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Compilar llama.cpp com Vulkan
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Executar no Developer PowerShell do VS&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;E:\&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://github.com/ggerganov/llama.cpp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;llama.cpp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-B&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DGGML_VULKAN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DCMAKE_BUILD_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nx"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Release&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-j20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Validação:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build\bin\Release&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\llama-cli.exe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--list-devices&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# Esperado: Vulkan0: AMD Radeon RX 580 2048SP ✅&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Compilar stable-diffusion.cpp com Vulkan
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--recursive&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://github.com/leejet/stable-diffusion.cpp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;stable-diffusion.cpp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mkdir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DGGML_VULKAN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DCMAKE_BUILD_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nx"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Release&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-j20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Subir o servidor
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;E&lt;/span&gt;:
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"E:\stable-diffusion.cpp\build\bin\Release"&lt;/span&gt;
&lt;span class="kd"&gt;sd&lt;/span&gt;&lt;span class="na"&gt;-server&lt;/span&gt;.exe &lt;span class="na"&gt;--listen-ip &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.0.0.0 &lt;span class="na"&gt;--listen-port &lt;/span&gt;&lt;span class="m"&gt;7860&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;-m &lt;/span&gt;&lt;span class="s2"&gt;"E:\models\dreamshaper_8.safetensors"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No OpenWebUI → Admin → Imagens → Automatic1111 → &lt;code&gt;http://SEU_IP_LOCAL:7860/&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ Crítico: dois tipos de GGUF incompatíveis
&lt;/h2&gt;

&lt;p&gt;Se você tentar rodar FLUX e receber &lt;code&gt;new_sd_ctx_t failed&lt;/code&gt; — você baixou o GGUF errado.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fonte&lt;/th&gt;
&lt;th&gt;Compatível com&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;city96&lt;/strong&gt; (HuggingFace)&lt;/td&gt;
&lt;td&gt;ComfyUI apenas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;leejet&lt;/strong&gt; (HuggingFace)&lt;/td&gt;
&lt;td&gt;stable-diffusion.cpp ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Sempre use: &lt;code&gt;https://huggingface.co/leejet/FLUX.1-schnell-gguf&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  O que não funcionou (documentado com causa raiz)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tentativa&lt;/th&gt;
&lt;th&gt;Erro&lt;/th&gt;
&lt;th&gt;Motivo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DirectML&lt;/td&gt;
&lt;td&gt;&lt;code&gt;OpaqueTensorImpl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Tensores MS incompatíveis com ComfyUI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROCm&lt;/td&gt;
&lt;td&gt;Kernel panics&lt;/td&gt;
&lt;td&gt;GCN4 removido no v5.x — permanente&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenVINO&lt;/td&gt;
&lt;td&gt;&lt;code&gt;No module 'ldm'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extensão para arquitetura antiga A1111&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU + HDD&lt;/td&gt;
&lt;td&gt;19 min/imagem&lt;/td&gt;
&lt;td&gt;Zero GPU + gargalo de I/O mecânico&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Documentação completa
&lt;/h2&gt;

&lt;p&gt;📖 Guia master (PT/EN/ES/FR/AR) com diagramas, benchmarks, scripts de automação:&lt;br&gt;
👉 &lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;setup-ia-local-rx580-vulkan.web.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📦 GitHub (scripts + docs):&lt;br&gt;
👉 &lt;a href="https://github.com/aivisionslab-studios/rx580-local-ai-guide" rel="noopener noreferrer"&gt;github.com/aivisionslab-studios/rx580-local-ai-guide&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;O problema nunca foi a placa.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>opensource</category>
      <category>programacao</category>
    </item>
    <item>
      <title>I ran Flux Schnell + LLMs on a $50 GPU. No CUDA. No cloud. No ROCm.</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Sun, 24 May 2026 13:04:51 +0000</pubDate>
      <link>https://dev.to/aivisionslab/i-ran-flux-schnell-llms-on-a-50-gpu-no-cuda-no-cloud-no-rocm-55ap</link>
      <guid>https://dev.to/aivisionslab/i-ran-flux-schnell-llms-on-a-50-gpu-no-cuda-no-cloud-no-rocm-55ap</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;All images in this article were generated locally on the RX 580 8GB described below.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The narrative was clear
&lt;/h2&gt;

&lt;p&gt;In 2026, every guide says the same thing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Your AMD RX 580 can't run AI. Buy a new GPU."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AMD dropped ROCm support for Polaris/GCN4 in v5.x.&lt;br&gt;
DirectML crashed with &lt;code&gt;OpaqueTensorImpl&lt;/code&gt; errors.&lt;br&gt;
OpenVINO failed silently.&lt;/p&gt;

&lt;p&gt;So we had a 8GB GPU sitting at 0% utilization while the CPU burned through LLM responses at 3 tokens/second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We refused to buy a new GPU.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The fix: Vulkan
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;ggml&lt;/code&gt; project — the engine behind &lt;code&gt;llama.cpp&lt;/code&gt; and &lt;code&gt;stable-diffusion.cpp&lt;/code&gt; — supports Vulkan as a GPU backend. Vulkan is an open standard that still supports the RX 580 natively since its 2017 drivers.&lt;/p&gt;

&lt;p&gt;No CUDA. No ROCm. No DirectML. Just Vulkan.&lt;/p&gt;


&lt;h2&gt;
  
  
  Results (real terminal logs, not benchmarks)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM inference&lt;/td&gt;
&lt;td&gt;Mistral 7B Q4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;15–16 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;DreamShaper 8 GGUF&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~72s/image&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1 Schnell&lt;/td&gt;
&lt;td&gt;flux1-schnell-q4_k (hybrid)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~14 min @ 1024×1024&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;CPU baseline without GPU: &lt;strong&gt;3–5 tok/s&lt;/strong&gt;.&lt;br&gt;
Vulkan uplift: &lt;strong&gt;3–4×&lt;/strong&gt; on a GPU that "doesn't support AI."&lt;/p&gt;


&lt;h2&gt;
  
  
  Hardware
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPU:     AMD RX 580 2048SP — 8GB GDDR5 (Polaris / GCN4)
CPU:     Intel Xeon E5-2690 v3 — 12c/24t (2014)
RAM:     32GB DDR4 REG ECC
Storage: NVMe 1TB — 1.7–3.5 GB/s
OS:      Windows 10 Pro + WSL2 Ubuntu 22.04
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;The NVMe alone reduced FLUX model load time from &lt;strong&gt;25 minutes to 30 seconds&lt;/strong&gt;.&lt;br&gt;
Storage is as critical as the GPU.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Build llama.cpp with Vulkan
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run in Developer PowerShell for VS&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;E:\&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://github.com/ggerganov/llama.cpp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;llama.cpp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-B&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DGGML_VULKAN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DCMAKE_BUILD_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nx"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Release&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-j20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Validate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build\bin\Release&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\llama-cli.exe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--list-devices&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="c"&gt;# Expected: Vulkan0: AMD Radeon RX 580 2048SP ✅&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Build stable-diffusion.cpp with Vulkan
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;git&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--recursive&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://github.com/leejet/stable-diffusion.cpp&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;stable-diffusion.cpp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mkdir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DGGML_VULKAN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DCMAKE_BUILD_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Release&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nx"&gt;cmake&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--config&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Release&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-j20&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Run the server
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;E&lt;/span&gt;:
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"E:\stable-diffusion.cpp\build\bin\Release"&lt;/span&gt;
&lt;span class="kd"&gt;sd&lt;/span&gt;&lt;span class="na"&gt;-server&lt;/span&gt;.exe &lt;span class="na"&gt;--listen-ip &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.0.0.0 &lt;span class="na"&gt;--listen-port &lt;/span&gt;&lt;span class="m"&gt;7860&lt;/span&gt; &lt;span class="se"&gt;^
&lt;/span&gt;  &lt;span class="na"&gt;-m &lt;/span&gt;&lt;span class="s2"&gt;"E:\models\dreamshaper_8.safetensors"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Connect OpenWebUI → Admin → Images → Automatic1111 → &lt;code&gt;http://YOUR_LOCAL_IP:7860/&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ Critical: two types of GGUF
&lt;/h2&gt;

&lt;p&gt;If you try to run FLUX and get &lt;code&gt;new_sd_ctx_t failed&lt;/code&gt; — you downloaded the wrong GGUF.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Compatible with&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;city96&lt;/strong&gt; (HuggingFace)&lt;/td&gt;
&lt;td&gt;ComfyUI only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;leejet&lt;/strong&gt; (HuggingFace)&lt;/td&gt;
&lt;td&gt;stable-diffusion.cpp ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Always use: &lt;code&gt;https://huggingface.co/leejet/FLUX.1-schnell-gguf&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What failed (documented)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attempt&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DirectML&lt;/td&gt;
&lt;td&gt;&lt;code&gt;OpaqueTensorImpl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;MS tensors can't talk to ComfyUI backends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROCm&lt;/td&gt;
&lt;td&gt;Kernel panics&lt;/td&gt;
&lt;td&gt;GCN4 dropped in v5.x — permanent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenVINO&lt;/td&gt;
&lt;td&gt;&lt;code&gt;No module 'ldm'&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extension targets old A1111 arch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU + HDD&lt;/td&gt;
&lt;td&gt;19 min/image&lt;/td&gt;
&lt;td&gt;No GPU + mechanical I/O bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Full documentation
&lt;/h2&gt;

&lt;p&gt;📖 Complete guide (PT/EN/ES/FR/AR) with architecture diagrams, benchmarks, automation scripts:&lt;br&gt;
👉 &lt;a href="https://setup-ia-local-rx580-vulkan.web.app" rel="noopener noreferrer"&gt;setup-ia-local-rx580-vulkan.web.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📦 GitHub (scripts + docs):&lt;br&gt;
👉 &lt;a href="https://github.com/aivisionslab-studios/rx580-local-ai-guide" rel="noopener noreferrer"&gt;github.com/aivisionslab-studios/rx580-local-ai-guide&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The problem was never the GPU.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Запуск Flux Schnell (12B) + LLM на устаревшей AMD RX 580 (8 ГБ) через Vulkan — Полное архитектурное руководство [2026]</title>
      <dc:creator>AIVisionsLab</dc:creator>
      <pubDate>Fri, 22 May 2026 18:24:02 +0000</pubDate>
      <link>https://dev.to/aivisionslab/zapusk-flux-schnell-12b-llm-na-ustarievshiei-amd-rx-580-8-gb-chieriez-vulkan-polnoie-arkhitiekturnoie-273d</link>
      <guid>https://dev.to/aivisionslab/zapusk-flux-schnell-12b-llm-na-ustarievshiei-amd-rx-580-8-gb-chieriez-vulkan-polnoie-arkhitiekturnoie-273d</guid>
      <description>&lt;p&gt;Многие считали, что RX 580 «мертва» для ИИ в 2026 году. Экосистемы, завязанные только на CUDA, прекращение поддержки Polaris в ROCm начиная с версии 5.x, и DirectML, который так и не был доведен до ума. Это подробный технический отчет о том, как мы доказали обратное.&lt;/p&gt;

&lt;h2&gt;
  
  
  Аппаратное обеспечение
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU:&lt;/strong&gt; AMD RX 580 2048SP — 8 ГБ GDDR5 VRAM (нативная поддержка Vulkan 1.x)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU:&lt;/strong&gt; Intel Xeon E5-2690 v3 — 12 ядер/24 потока @ 3.5 ГГц boost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAM:&lt;/strong&gt; 32 ГБ DDR4 REG ECC Quad Channel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Накопитель:&lt;/strong&gt; NVMe 1 ТБ (критически важно для устранения «узких мест»)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ОС:&lt;/strong&gt; Windows 10 Pro + WSL2 Ubuntu 22.04.5&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Почему другие решения не работают?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Решение&lt;/th&gt;
&lt;th&gt;Статус&lt;/th&gt;
&lt;th&gt;Причина&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CUDA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Только для Nvidia&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ROCm&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Поддержка Polaris прекращена в v5.x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DirectML&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Ошибка &lt;code&gt;OpaqueTensorImpl&lt;/code&gt; в CLIPTextEncode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenVINO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Отсутствие модулей &lt;code&gt;ldm/sgm&lt;/code&gt; в Forge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Фатальная ошибка DirectML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NotImplementedError: Cannot access storage of OpaqueTensorImpl

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Драйвер упаковывает память в непрозрачные тензоры (opaque tensors), которые бэкенды внимания ComfyUI не могут считать. Это тупик.&lt;/p&gt;

&lt;h2&gt;
  
  
  Решение — Двухуровневая архитектура
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ПУТЬ 1 — GPU Vulkan (ускорение RX 580)
&lt;/h3&gt;

&lt;p&gt;Нативная сборка &lt;code&gt;stable-diffusion.cpp&lt;/code&gt;, скомпилированная с &lt;code&gt;-DGGML_VULKAN=ON&lt;/code&gt;. Движок &lt;code&gt;ggml&lt;/code&gt; работает напрямую с GPU без необходимости в ROCm или CUDA. Модели SD 1.5 GGUF генерируют изображение примерно за 72 секунды.&lt;/p&gt;

&lt;h3&gt;
  
  
  ПУТЬ 2 — CPU Xeon (тяжелые SOTA модели)
&lt;/h3&gt;

&lt;p&gt;FLUX.1 Schnell (16 ГБ) превышает объем физической VRAM. ComfyUI работает через CPU внутри WSL2, используя ECC RAM в качестве стабильной виртуальной VRAM. Генерация 768x768 занимает ~24 минуты.&lt;/p&gt;

&lt;h3&gt;
  
  
  Гибридная сегментация памяти (Flux 12B Q4_K)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Компонент&lt;/th&gt;
&lt;th&gt;Файл&lt;/th&gt;
&lt;th&gt;Выделение памяти&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Diffusion Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;flux1-schnell-q4_k.gguf&lt;/td&gt;
&lt;td&gt;GPU VRAM ~6.5 ГБ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VAE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ae.safetensors&lt;/td&gt;
&lt;td&gt;CPU RAM ~160 МБ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CLIP L&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;clip_l.safetensors&lt;/td&gt;
&lt;td&gt;GPU VRAM ~235 МБ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;T5XXL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;t5xxl_fp16.safetensors&lt;/td&gt;
&lt;td&gt;CPU RAM ~9.3 ГБ&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Команда для запуска
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sd-server.exe &lt;span class="nt"&gt;--listen-ip&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--listen-port&lt;/span&gt; 7860 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--diffusion-model&lt;/span&gt; &lt;span class="s2"&gt;"E:&lt;/span&gt;&lt;span class="se"&gt;\m&lt;/span&gt;&lt;span class="s2"&gt;odels&lt;/span&gt;&lt;span class="se"&gt;\f&lt;/span&gt;&lt;span class="s2"&gt;lux1-schnell-q4_k.gguf"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vae&lt;/span&gt; &lt;span class="s2"&gt;"E:&lt;/span&gt;&lt;span class="se"&gt;\m&lt;/span&gt;&lt;span class="s2"&gt;odels&lt;/span&gt;&lt;span class="se"&gt;\a&lt;/span&gt;&lt;span class="s2"&gt;e.safetensors"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--clip_l&lt;/span&gt; &lt;span class="s2"&gt;"E:&lt;/span&gt;&lt;span class="se"&gt;\m&lt;/span&gt;&lt;span class="s2"&gt;odels&lt;/span&gt;&lt;span class="se"&gt;\c&lt;/span&gt;&lt;span class="s2"&gt;lip_l.safetensors"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--t5xxl&lt;/span&gt; &lt;span class="s2"&gt;"E:&lt;/span&gt;&lt;span class="se"&gt;\m&lt;/span&gt;&lt;span class="s2"&gt;odels&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s2"&gt;5xxl_fp16.safetensors"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cfg-scale&lt;/span&gt; 1.0 &lt;span class="nt"&gt;--steps&lt;/span&gt; 4 &lt;span class="nt"&gt;--clip-on-cpu&lt;/span&gt; &lt;span class="nt"&gt;--vae-on-cpu&lt;/span&gt; &lt;span class="nt"&gt;--vae-tiling&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--vae-on-cpu&lt;/code&gt; и &lt;code&gt;--vae-tiling&lt;/code&gt; обязательны. Без них ошибка &lt;code&gt;DeviceMemoryAllocation&lt;/code&gt; возникает мгновенно.&lt;/p&gt;

&lt;h2&gt;
  
  
  Бенчмарки
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Задача&lt;/th&gt;
&lt;th&gt;Бэкенд&lt;/th&gt;
&lt;th&gt;Результат&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM инференс&lt;/td&gt;
&lt;td&gt;Только CPU&lt;/td&gt;
&lt;td&gt;3–5 токенов/с ❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM инференс&lt;/td&gt;
&lt;td&gt;RX 580 Vulkan&lt;/td&gt;
&lt;td&gt;15–16 токенов/с ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SD 1.5 20 шагов&lt;/td&gt;
&lt;td&gt;DirectML&lt;/td&gt;
&lt;td&gt;~450с + сбой ❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SD 1.5 20 шагов&lt;/td&gt;
&lt;td&gt;Vulkan натив&lt;/td&gt;
&lt;td&gt;~72с ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flux 1024x1024&lt;/td&gt;
&lt;td&gt;Xeon CPU WSL2&lt;/td&gt;
&lt;td&gt;~24 мин ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Примечание: Время загрузки моделей сократилось с 25 мин (HDD) до 4 мин (NVMe).&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Карта сервисов
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenWebUI Docker :3000
  ├── llama-server.exe :8081  (Vulkan — RX 580)
  ├── sd-server.exe    :7860  (Vulkan — RX 580)
  └── ComfyUI          :8188  (CPU — Xeon WSL2)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ресурсы
&lt;/h2&gt;

&lt;p&gt;Полная документация, &lt;code&gt;.bat&lt;/code&gt; скрипты оркестрации и скомпилированные бинарные файлы:&lt;br&gt;
👉 &lt;a href="https://setup-ia-local-rx580-vulkan.firebaseapp.com/" rel="noopener noreferrer"&gt;https://setup-ia-local-rx580-vulkan.firebaseapp.com/&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Железо не умирает. Оно просто получает вторую жизнь благодаря правильному ПО.&lt;/strong&gt; &lt;em&gt;Используете старые карты AMD для ИИ? Давайте обсудим оптимизацию буферов и задержки в комментариях.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Совет:&lt;/strong&gt; Для тегов на Dev.to используйте: &lt;code&gt;russia&lt;/code&gt;, &lt;code&gt;ai&lt;/code&gt;, &lt;code&gt;hardware&lt;/code&gt;, &lt;code&gt;amd&lt;/code&gt;, &lt;code&gt;vulkan&lt;/code&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
