<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: GeneLab_999</title>
    <description>The latest articles on DEV Community by GeneLab_999 (@genelab_999).</description>
    <link>https://dev.to/genelab_999</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3743545%2Ffccaf9b5-3165-4db8-9ec2-2ef6aeb18202.png</url>
      <title>DEV Community: GeneLab_999</title>
      <link>https://dev.to/genelab_999</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/genelab_999"/>
    <language>en</language>
    <item>
      <title>🛠️ I Built a One-Click ComfyUI Setup for RTX 5090 on Windows — No WSL2, No Docker</title>
      <dc:creator>GeneLab_999</dc:creator>
      <pubDate>Mon, 02 Mar 2026 15:18:28 +0000</pubDate>
      <link>https://dev.to/genelab_999/i-built-a-one-click-comfyui-setup-for-rtx-5090-on-windows-no-wsl2-no-docker-4n1i</link>
      <guid>https://dev.to/genelab_999/i-built-a-one-click-comfyui-setup-for-rtx-5090-on-windows-no-wsl2-no-docker-4n1i</guid>
      <description>&lt;p&gt;I bought an RTX 5090. 32GB VRAM. The most powerful consumer GPU on the planet.&lt;/p&gt;

&lt;p&gt;Then I tried to run ComfyUI on Windows. It broke immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RuntimeError: sm_120 is not compatible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Three days later, I had a fully working solution. I packaged it and open-sourced it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell" rel="noopener noreferrer"&gt;ComfyUI-Win-Blackwell&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's the whole story.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why RTX 50-series Breaks Everything
&lt;/h2&gt;

&lt;p&gt;NVIDIA's Blackwell architecture (RTX 5090/5080/5070) uses a new compute capability code called &lt;code&gt;sm_120&lt;/code&gt;. The problem? PyTorch's stable release doesn't include kernels for it.&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pip install torch&lt;/code&gt; → &lt;strong&gt;doesn't work on Blackwell&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You need PyTorch nightly with CUDA 13.0 (cu130)&lt;/li&gt;
&lt;li&gt;But then &lt;strong&gt;xformers&lt;/strong&gt; (the standard ComfyUI speed boost) forces PyTorch back to stable&lt;/li&gt;
&lt;li&gt;And custom nodes silently pull stable PyTorch through their dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a dependency trap. Every fix creates a new problem.&lt;/p&gt;


&lt;h2&gt;
  
  
  The 5 Rules I Discovered
&lt;/h2&gt;

&lt;p&gt;After 3 days of trial and error, I distilled everything into 5 rules. Break any one of them and your environment dies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 1: Use PyTorch nightly cu130&lt;/strong&gt; — stable doesn't have sm_120 kernels.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 2: Never install xformers&lt;/strong&gt; — it force-downgrades PyTorch to stable. This is the trap that got me twice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 3: Strip &lt;code&gt;torch&lt;/code&gt; from every requirements.txt&lt;/strong&gt; — custom nodes list torch as a dependency, and pip will happily replace your nightly build with stable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 4: Verify PyTorch after every custom node install&lt;/strong&gt; — run &lt;code&gt;python -c "import torch; print(torch.__version__)"&lt;/code&gt; and check that it still says &lt;code&gt;cu130&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 5: Clear proxy environment variables&lt;/strong&gt; — system proxies block pip and git silently on Windows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Pro Tip:&lt;/strong&gt; Rule 2 was the hardest to figure out. xformers installs &lt;em&gt;successfully&lt;/em&gt;, ComfyUI &lt;em&gt;starts&lt;/em&gt; fine, and then crashes mid-inference with &lt;code&gt;sm_120 is not compatible&lt;/code&gt;. You don't even realize PyTorch was downgraded until you check the version.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I automated all 5 rules into a one-click setup:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ComfyUI-Win-Blackwell
&lt;span class="c"&gt;# Double-click setup.bat — done in ~20 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;What &lt;code&gt;setup.bat&lt;/code&gt; handles:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.13 environment&lt;/li&gt;
&lt;li&gt;PyTorch nightly cu130 (not stable, not cu128)&lt;/li&gt;
&lt;li&gt;triton-windows + torch.compile (replaces xformers)&lt;/li&gt;
&lt;li&gt;ComfyUI core + custom dependencies (with torch stripped out)&lt;/li&gt;
&lt;li&gt;28 verified custom nodes&lt;/li&gt;
&lt;li&gt;Post-install verification that cu130 is still intact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also built companion tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;verify_env.py&lt;/code&gt; — Blackwell-specific environment checker (sm_120, cu130, Triton, torch.compile)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fix_windows_compat.py&lt;/code&gt; — Converts Linux workflow JSON paths to Windows format, replaces SageAttention with SDPA&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;update.bat&lt;/code&gt; — Updates everything while preserving Blackwell compatibility&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  What I Verified
&lt;/h2&gt;

&lt;p&gt;I tested 28 custom nodes one by one. Install → check PyTorch version → run test → record result. That was the most tedious part.&lt;/p&gt;

&lt;p&gt;I also tested 5 Image-to-Video pipelines on 32GB VRAM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HunyuanVideo 1.5 I2V&lt;/strong&gt; (8.3B params, ~16GB) — Smooth. My top recommendation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kandinsky 5.0 Lite I2V&lt;/strong&gt; (2B, ~4GB) — Very smooth. Great for quick tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LTX-2 I2V&lt;/strong&gt; (19B, ~25GB) — Works in FP8. Tight but fine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LongCat-Video TI2V&lt;/strong&gt; (13.6B, ~14.5GB) — Works with adjustments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kandinsky 5.0 Pro I2V&lt;/strong&gt; (19B, ~40GB) — Needs CPU offload. Slow.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Why Not Just Use WSL2 or Docker?
&lt;/h2&gt;

&lt;p&gt;The short answer: performance.&lt;/p&gt;

&lt;p&gt;Loading safetensors through WSL2's NTFS translation layer is noticeably slower. Docker has the same issue plus additional setup complexity. For a tool like ComfyUI where you're iterating on workflows and loading large models frequently, native Windows file I/O makes a real difference.&lt;/p&gt;

&lt;p&gt;Also, most AI artists using ComfyUI on Windows aren't Docker experts. A &lt;code&gt;.bat&lt;/code&gt; file they can double-click is the right UX.&lt;/p&gt;


&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;If you have an RTX 5090/5080/5070 and want to run ComfyUI on Windows without WSL2 or Docker, give it a try:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell" rel="noopener noreferrer"&gt;github.com/hiroki-abe-58/ComfyUI-Win-Blackwell&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/hiroki-abe-58" rel="noopener noreferrer"&gt;
        hiroki-abe-58
      &lt;/a&gt; / &lt;a href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell" rel="noopener noreferrer"&gt;
        ComfyUI-Win-Blackwell
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;ComfyUI for GeForce RTX 50-Series (Blackwell)&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The first fully documented, Windows-native ComfyUI setup for NVIDIA GeForce RTX 5090/5080/5070 (Blackwell architecture, sm_120) with CUDA 13.0.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Other languages:&lt;/strong&gt; &lt;a href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell/README_ja.md" rel="noopener noreferrer"&gt;日本語&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell/README_zh.md" rel="noopener noreferrer"&gt;中文&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell/README_ko.md" rel="noopener noreferrer"&gt;한국어&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What Makes This Special&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;RTX 50-series GPUs (Blackwell, Compute Capability sm_120) are &lt;strong&gt;not supported by PyTorch stable releases&lt;/strong&gt; as of early 2026. Running ComfyUI on these GPUs requires specific versions and workarounds that are not documented anywhere else in a single, reproducible package.&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Technical Highlights&lt;/h3&gt;
&lt;/div&gt;

&lt;p&gt;&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;thead&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/thead&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;GPU Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;NVIDIA Blackwell (sm_120) -- RTX 5090 / 5080 / 5070&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;CUDA Version&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;13.0 (cu130) -- the latest CUDA runtime&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;PyTorch&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Nightly cu130 build (not stable, not cu128)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;3.13 (latest)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;Triton&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;triton-windows fork (official Triton is Linux-only)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;xformers&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Deliberately excluded (causes PyTorch downgrade)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;Custom Nodes&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;28 verified nodes including video &amp;amp; music generation&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;Platform&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;Windows Native (no WSL2, no Docker required)&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Why This Is Unique&lt;/h3&gt;

&lt;/div&gt;


&lt;ol&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Blackwell + Windows Native +&lt;/strong&gt;…&lt;/p&gt;


&lt;/li&gt;

&lt;/ol&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/hiroki-abe-58/ComfyUI-Win-Blackwell" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;





&lt;p&gt;MIT licensed. Stars and PRs welcome — especially if you verify additional custom nodes on Blackwell hardware.&lt;/p&gt;

&lt;p&gt;Have you tried running AI tools on RTX 50-series? What was your experience? Let me know in the comments! 👇&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you found this helpful, consider following me for more AI + GPU content!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;📝 Japanese version: &lt;a href="https://qiita.com/GeneLab_999" rel="noopener noreferrer"&gt;Qiita&lt;/a&gt; / &lt;a href="https://zenn.dev/rick_lyric" rel="noopener noreferrer"&gt;Zenn&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;🐦 Follow me on X: &lt;a href="https://x.com/geneLab_999" rel="noopener noreferrer"&gt;@geneLab_999&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;💻 GitHub: &lt;a href="https://github.com/hiroki-abe-58" rel="noopener noreferrer"&gt;hiroki-abe-58&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>ai</category>
      <category>showdev</category>
      <category>python</category>
    </item>
    <item>
      <title>How I Run 6 AI Services Simultaneously on RTX 5090 + WSL2 + Docker (And You Can Too)</title>
      <dc:creator>GeneLab_999</dc:creator>
      <pubDate>Sat, 21 Feb 2026 22:53:44 +0000</pubDate>
      <link>https://dev.to/genelab_999/how-i-run-6-ai-services-simultaneously-on-rtx-5090-wsl2-docker-and-you-can-too-539a</link>
      <guid>https://dev.to/genelab_999/how-i-run-6-ai-services-simultaneously-on-rtx-5090-wsl2-docker-and-you-can-too-539a</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgi3gqfdvscvd8l33emrs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgi3gqfdvscvd8l33emrs.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I built a multi-service local AI stack (image gen, video gen, voice synthesis, voice cloning) running on RTX 5090 via WSL2 Docker. The key breakthrough was solving the GPU driver passthrough layer that nobody documented. Here's the architecture, the critical &lt;code&gt;gpu-run&lt;/code&gt; function, and everything I learned the hard way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Solved
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pxxeronj3qvsolju6yb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pxxeronj3qvsolju6yb.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In August 2025, I bought an RTX 5090. Blackwell architecture. 32GB GDDR7. Compute capability &lt;code&gt;sm_120&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And nobody could make it work with WSL2 + Docker + PyTorch.&lt;/p&gt;

&lt;p&gt;The issue wasn't any single component. &lt;code&gt;nvidia-smi&lt;/code&gt; worked fine in containers. &lt;code&gt;libcuda.so.1&lt;/code&gt; loaded correctly. But PyTorch kept returning &lt;code&gt;torch.cuda.is_available() = False&lt;/code&gt; with a cryptic &lt;code&gt;Error 500: named symbol not found&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I spent roughly 40 hours debugging. Here's what I found, and how I turned it into a production multi-service AI environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Root Cause
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbrbuumujv100b3qcazv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdbrbuumujv100b3qcazv.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The failure point was in the &lt;strong&gt;interaction layer&lt;/strong&gt; between WSL2's driver mounting and Docker's GPU runtime.&lt;/p&gt;

&lt;p&gt;When you run &lt;code&gt;--gpus all&lt;/code&gt; in a Docker container on WSL2, the NVIDIA Container Toolkit mounts &lt;code&gt;/usr/lib/wsl/lib&lt;/code&gt; into the container. This directory contains &lt;code&gt;libcuda.so.1&lt;/code&gt; and friends. For most GPUs, this is enough.&lt;/p&gt;

&lt;p&gt;For the RTX 5090, it's not.&lt;/p&gt;

&lt;p&gt;The actual driver binaries live in a &lt;strong&gt;separate directory&lt;/strong&gt;: &lt;code&gt;/usr/lib/wsl/drivers/nvmdi.inf_amd64_&amp;lt;hash&amp;gt;&lt;/code&gt;. This directory contains the real &lt;code&gt;libcuda.so.1.1&lt;/code&gt;, &lt;code&gt;libnvdxgdmal.so.1&lt;/code&gt;, &lt;code&gt;libnvidia-ptxjitcompiler.so.1&lt;/code&gt;, and other dependencies that the PyTorch CUDA runtime needs to initialize the Blackwell architecture.&lt;/p&gt;

&lt;p&gt;Without mounting this directory AND setting &lt;code&gt;LD_LIBRARY_PATH&lt;/code&gt; to include it, PyTorch's CUDA initialization hits a dead end -- it finds &lt;code&gt;libcuda.so.1&lt;/code&gt; but can't resolve the sm_120-specific symbols.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: &lt;code&gt;gpu-run&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Here's the function that makes everything work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gpu-run &lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;local &lt;/span&gt;D BN
  &lt;span class="nv"&gt;D&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; /usr/lib/wsl/drivers/nvmdi.inf_amd64_&lt;span class="k"&gt;*&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-n1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;return &lt;/span&gt;1
  &lt;span class="nv"&gt;BN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$D&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Using driver path: &lt;/span&gt;&lt;span class="nv"&gt;$D&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; /usr/lib/wsl/lib:/usr/lib/wsl/lib:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$D&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;:/usr/lib/wsl/drivers/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;:ro &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;LD_LIBRARY_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/lib/wsl/lib:/usr/lib/wsl/drivers/&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$@&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this does:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Finds the driver directory dynamically&lt;/strong&gt; -- the hash suffix changes with driver updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mounts both WSL lib paths&lt;/strong&gt; -- the standard &lt;code&gt;/usr/lib/wsl/lib&lt;/code&gt; AND the driver-specific directory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sets &lt;code&gt;LD_LIBRARY_PATH&lt;/code&gt;&lt;/strong&gt; to prioritize these paths for symbol resolution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;gpu-run.sh
gpu-run torch-wsl-cu128 python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
import torch
print('PyTorch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('VRAM:', torch.cuda.get_device_properties(0).total_mem // 1024**3, 'GB')
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Using driver path: /usr/lib/wsl/drivers/nvmdi.inf_amd64_fb80e95fa979ce23
PyTorch: 2.9.0.dev20250812+cu128
CUDA available: True
GPU: NVIDIA GeForce RTX 5090
VRAM: 32 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Dockerfile Template
&lt;/h2&gt;

&lt;p&gt;Every AI service in my stack uses a variation of this base:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; nvidia/cuda:12.8.0-devel-ubuntu22.04&lt;/span&gt;

&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; TZ=Asia/Tokyo&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DEBIAN_FRONTEND=noninteractive&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PYTHONUNBUFFERED=1&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; CUDA_HOME=/usr/local/cuda&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-snf&lt;/span&gt; /usr/share/zoneinfo/&lt;span class="nv"&gt;$TZ&lt;/span&gt; /etc/localtime &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$TZ&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /etc/timezone

&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    python3 python3-pip python3-dev git ffmpeg ca-certificates &lt;span class="se"&gt;\
&lt;/span&gt;    build-essential cmake ninja-build libsndfile1 &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip
&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nv"&gt;numpy&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;1.26.4

&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;--pre&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    torch torchvision torchaudio &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/nightly/cu128

&lt;span class="k"&gt;RUN &lt;/span&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import torch; print('PyTorch:', torch.__version__); assert 'cu128' in torch.__version__"&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;nvidia/cuda:12.8.0-devel-ubuntu22.04&lt;/code&gt;&lt;/strong&gt; -- CUDA 12.8 is the minimum for sm_120. Using &lt;code&gt;devel&lt;/code&gt; (not &lt;code&gt;runtime&lt;/code&gt;) because some AI frameworks compile CUDA extensions at build time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyTorch nightly cu128&lt;/strong&gt; -- as of early 2026, stable PyTorch still has incomplete Blackwell support. Nightly cu128 is non-negotiable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;numpy pinned to 1.26.4&lt;/strong&gt; -- numpy 2.x breaks several AI frameworks that haven't updated their C extensions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install torch LAST&lt;/strong&gt; -- many &lt;code&gt;requirements.txt&lt;/code&gt; files include torch. If you install dependencies first, they'll pull in a stable torch that doesn't support sm_120. Always install your carefully selected torch version as the final step.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Docker Compose Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugigk21n3lqhq2cywzsj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fugigk21n3lqhq2cywzsj.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's how six AI services coexist in a single &lt;code&gt;compose.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;comfyui&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./apps/comfyui&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;comfyui:wsl-cu12&lt;/span&gt;
    &lt;span class="na"&gt;profiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comfyui"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib/wsl/drivers/${WSL_DRV_BN}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:512&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;CUDA_VISIBLE_DEVICES=0&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/usr/lib/wsl/lib:/usr/lib/wsl/lib:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;${WSL_DRV_DIR}:/usr/lib/wsl/drivers/${WSL_DRV_BN}:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data/comfyui-models:/app/models&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./shared/models:/shared/models:ro&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8188:8188"&lt;/span&gt;
    &lt;span class="na"&gt;ipc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host&lt;/span&gt;
    &lt;span class="na"&gt;ulimits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
      &lt;span class="na"&gt;stack&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;67108864&lt;/span&gt;

  &lt;span class="na"&gt;sbv2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./apps/sbv2&lt;/span&gt;
      &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sbv2:wsl-cu12&lt;/span&gt;
    &lt;span class="na"&gt;profiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sbv2"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib/wsl/drivers/${WSL_DRV_BN}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:512&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/usr/lib/wsl/lib:/usr/lib/wsl/lib:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;${WSL_DRV_DIR}:/usr/lib/wsl/drivers/${WSL_DRV_BN}:ro&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data/sbv2-models:/opt/models&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5000:5000"&lt;/span&gt;
    &lt;span class="na"&gt;ipc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host&lt;/span&gt;
    &lt;span class="na"&gt;ulimits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;memlock&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-1&lt;/span&gt;
      &lt;span class="na"&gt;stack&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;67108864&lt;/span&gt;

  &lt;span class="na"&gt;cosyvoice&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;profiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosyvoice"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7865:7865"&lt;/span&gt;

  &lt;span class="na"&gt;rvc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;profiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rvc"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7866:7866"&lt;/span&gt;

  &lt;span class="na"&gt;framepack&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;profiles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;framepack"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7862:7862"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Each service follows the same WSL driver mount pattern -- I've abbreviated the later ones for readability.)&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.env&lt;/code&gt; file is auto-generated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;WSL_DRV_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; /usr/lib/wsl/drivers/nvmdi.inf_amd64_&lt;span class="k"&gt;*&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-n1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;WSL_DRV_BN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;basename&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WSL_DRV_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
WSL_DRV_DIR=&lt;/span&gt;&lt;span class="nv"&gt;$WSL_DRV_DIR&lt;/span&gt;&lt;span class="sh"&gt;
WSL_DRV_BN=&lt;/span&gt;&lt;span class="nv"&gt;$WSL_DRV_BN&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Design Decisions That Saved My Sanity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Docker Profiles for Resource Isolation
&lt;/h3&gt;

&lt;p&gt;With 32GB VRAM, you can't run everything simultaneously. Video generation alone can eat 24GB. Docker profiles let me spin up exactly what I need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;--profile&lt;/span&gt; comfyui up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker compose &lt;span class="nt"&gt;--profile&lt;/span&gt; sbv2 &lt;span class="nt"&gt;--profile&lt;/span&gt; cosyvoice up &lt;span class="nt"&gt;-d&lt;/span&gt;
docker compose &lt;span class="nt"&gt;--profile&lt;/span&gt; all up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Shared Model Directory
&lt;/h3&gt;

&lt;p&gt;AI models are enormous. Flux checkpoints, HunyuanVideo weights, voice models -- easily 200GB+. Instead of duplicating them per container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/ai-workspace-correct/
  shared/
    models/           # Cross-service shared models
    hf_cache/         # HuggingFace cache (persistent)
  data/
    comfyui-models/   # Service-specific models
    sbv2-models/
    cosyvoice-models/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service mounts &lt;code&gt;shared/models&lt;/code&gt; read-only. Service-specific models go in their own &lt;code&gt;data/&lt;/code&gt; directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Port Allocation Strategy
&lt;/h3&gt;

&lt;p&gt;I carved out port ranges by domain:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Services&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5000-5009&lt;/td&gt;
&lt;td&gt;Voice synthesis&lt;/td&gt;
&lt;td&gt;Style-BERT-VITS2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7860-7869&lt;/td&gt;
&lt;td&gt;Voice/Video AI&lt;/td&gt;
&lt;td&gt;FramePack, CosyVoice, RVC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8180-8189&lt;/td&gt;
&lt;td&gt;Image AI&lt;/td&gt;
&lt;td&gt;ComfyUI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This avoids collisions and makes firewall rules predictable.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The torchaudio Trap
&lt;/h3&gt;

&lt;p&gt;This one cost me hours. Several voice synthesis frameworks use &lt;code&gt;torchaudio.info()&lt;/code&gt; and &lt;code&gt;torchaudio.load()&lt;/code&gt;. The nightly cu128 build of torchaudio has breaking API changes. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;soundfile&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;

&lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wav_path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;samplerate&lt;/span&gt;
&lt;span class="n"&gt;audio_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wav_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I patch these at Docker build time with &lt;code&gt;sed&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'s/import torchaudio/import torchaudio\nimport soundfile as sf/'&lt;/span&gt; /opt/app/webui.py &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'s/torchaudio.info(prompt_wav).sample_rate/sf.info(prompt_wav).samplerate/g'&lt;/span&gt; /opt/app/webui.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Lessons Learned (The Hard Way)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewd146y03lfz0thx71y1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewd146y03lfz0thx71y1.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Never let &lt;code&gt;requirements.txt&lt;/code&gt; install torch.&lt;/strong&gt;&lt;br&gt;
Strip &lt;code&gt;torch&lt;/code&gt;, &lt;code&gt;torchvision&lt;/code&gt;, &lt;code&gt;torchaudio&lt;/code&gt; from every &lt;code&gt;requirements.txt&lt;/code&gt; before installing. Then install your nightly cu128 build as the final step. If you don't, pip will happily overwrite your working torch with a stable version that can't see your GPU.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Driver updates break the hash.&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;nvmdi.inf_amd64_&amp;lt;hash&amp;gt;&lt;/code&gt; directory changes when you update NVIDIA drivers. The &lt;code&gt;gpu-run&lt;/code&gt; function handles this with dynamic lookup. But if you hardcode the path anywhere, you'll have a bad time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. &lt;code&gt;ipc: host&lt;/code&gt; is non-negotiable for AI workloads.&lt;/strong&gt;&lt;br&gt;
Without it, PyTorch's shared memory operations fail silently or with cryptic errors. Always set it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. &lt;code&gt;PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
This environment variable enables PyTorch's memory-efficient allocation strategy. Without it on 32GB VRAM, you'll hit fragmentation issues on large models that shouldn't theoretically run out of memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Document everything as if you'll have amnesia tomorrow.&lt;/strong&gt;&lt;br&gt;
I wrote my setup docs with the goal of "restore everything from scratch in 30 minutes." That document has saved me three times already.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Stack (February 2026)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ComfyUI&lt;/td&gt;
&lt;td&gt;Image generation (Flux, SDXL)&lt;/td&gt;
&lt;td&gt;8188&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Style-BERT-VITS2&lt;/td&gt;
&lt;td&gt;Japanese TTS voice synthesis&lt;/td&gt;
&lt;td&gt;5000&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CosyVoice&lt;/td&gt;
&lt;td&gt;Multi-speaker voice synthesis&lt;/td&gt;
&lt;td&gt;7865&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RVC&lt;/td&gt;
&lt;td&gt;Real-time voice conversion&lt;/td&gt;
&lt;td&gt;7866&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FramePack&lt;/td&gt;
&lt;td&gt;Video generation (HunyuanVideo)&lt;/td&gt;
&lt;td&gt;7862&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All running on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU&lt;/strong&gt;: RTX 5090 32GB GDDR7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CPU&lt;/strong&gt;: Intel Core Ultra 9 285K&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAM&lt;/strong&gt;: 64GB DDR5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OS&lt;/strong&gt;: Windows 11 Pro + WSL2 Ubuntu 22.04&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container runtime&lt;/strong&gt;: Docker with NVIDIA Container Toolkit&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Is This Still Unique?
&lt;/h2&gt;

&lt;p&gt;As of February 2026, there are published examples of single-service RTX 5090 + Docker setups (vLLM, ComfyUI, basic PyTorch). What I haven't found elsewhere is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;multi-service Docker Compose stack&lt;/strong&gt; orchestrating 5+ AI services on Blackwell&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;specific WSL2 driver mount solution&lt;/strong&gt; documented with the &lt;code&gt;nvmdi.inf_amd64_*&lt;/code&gt; path&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;systematic approach to dependency isolation&lt;/strong&gt; across services sharing one GPU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-grade patterns&lt;/strong&gt; for model sharing, port management, and environment recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've done something similar, I'd genuinely love to hear about it. Drop a comment or reach out.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with ~40 hours of debugging, 200+ GB of model files, and an unreasonable amount of stubbornness. Based in Tokyo.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;#rtx5090&lt;/code&gt; &lt;code&gt;#docker&lt;/code&gt; &lt;code&gt;#wsl2&lt;/code&gt; &lt;code&gt;#pytorch&lt;/code&gt; &lt;code&gt;#cuda&lt;/code&gt; &lt;code&gt;#blackwell&lt;/code&gt; &lt;code&gt;#ai&lt;/code&gt; &lt;code&gt;#selfhosted&lt;/code&gt;&lt;/p&gt;

</description>
      <category>cuda</category>
      <category>linux</category>
      <category>showdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built a Voice Cloning GUI That Supports 10 Languages — Here's What I Learned Wrestling with CUDA on Windows published</title>
      <dc:creator>GeneLab_999</dc:creator>
      <pubDate>Sat, 21 Feb 2026 22:05:52 +0000</pubDate>
      <link>https://dev.to/genelab_999/i-built-a-voice-cloning-gui-that-supports-10-languages-heres-what-i-learned-wrestling-with-cuda-30gp</link>
      <guid>https://dev.to/genelab_999/i-built-a-voice-cloning-gui-that-supports-10-languages-heres-what-i-learned-wrestling-with-cuda-30gp</guid>
      <description>&lt;p&gt;Have you ever recorded yourself speaking and thought, &lt;em&gt;"I wish I could just type what I want to say and have my own voice read it back"&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;That's exactly the rabbit hole I fell down when Alibaba dropped &lt;a href="https://github.com/QwenLM/Qwen3-TTS" rel="noopener noreferrer"&gt;Qwen3-TTS&lt;/a&gt; — an open-source TTS model that can clone any voice from just &lt;strong&gt;3 seconds of audio&lt;/strong&gt;. Ten languages. 97ms latency. Apache 2.0 license. On paper, it was everything I'd ever wanted.&lt;/p&gt;

&lt;p&gt;In practice? It assumed Linux. FlashAttention 2 (recommended) doesn't run on Windows. And voice cloning required you to &lt;em&gt;manually transcribe&lt;/em&gt; your reference audio — which kind of defeats the purpose of a "quick clone" workflow.&lt;/p&gt;

&lt;p&gt;So I did what any developer would do: I forked it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/hiroki-abe-58" rel="noopener noreferrer"&gt;
        hiroki-abe-58
      &lt;/a&gt; / &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP" rel="noopener noreferrer"&gt;
        Qwen3-TTS-JP
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Japanese GUI + Whisper auto-transcription for Qwen3-TTS. RTX 5090 tested.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Qwen3-TTS-JP&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;English&lt;/strong&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_ja.md" rel="noopener noreferrer"&gt;日本語&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_zh.md" rel="noopener noreferrer"&gt;中文&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_ko.md" rel="noopener noreferrer"&gt;한국어&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_ru.md" rel="noopener noreferrer"&gt;Русский&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_es.md" rel="noopener noreferrer"&gt;Español&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_it.md" rel="noopener noreferrer"&gt;Italiano&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_de.md" rel="noopener noreferrer"&gt;Deutsch&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_fr.md" rel="noopener noreferrer"&gt;Français&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_pt.md" rel="noopener noreferrer"&gt;Português&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;Windows-native&lt;/strong&gt; fork of Qwen3-TTS with a modern, multilingual Web UI.&lt;/p&gt;
&lt;p&gt;The original Qwen3-TTS was developed primarily for Linux environments, and FlashAttention 2 is recommended. However, FlashAttention 2 does not work on Windows. This fork enables &lt;strong&gt;direct execution on Windows without WSL2 or Docker&lt;/strong&gt;, provides a &lt;strong&gt;modern Web UI supporting 10 languages&lt;/strong&gt;, and adds automatic transcription via Whisper.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mac (Apple Silicon) users:&lt;/strong&gt; For the best experience on Mac, please use &lt;strong&gt;&lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-Mac-GeneLab" rel="noopener noreferrer"&gt;Qwen3-TTS-Mac-GeneLab&lt;/a&gt;&lt;/strong&gt; -- fully optimized for Apple Silicon with MLX + PyTorch dual engine, 8bit/4bit quantization, and 10-language Web UI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Custom Voice -- Speech synthesis with preset speakers&lt;/h3&gt;
&lt;/div&gt;
&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/CustomVoice.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FCustomVoice.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Voice Design -- Describe voice characteristics to synthesize&lt;/h3&gt;
&lt;/div&gt;

&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/VoiceDesign.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FVoiceDesign.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Voice Clone -- Clone voice from reference audio&lt;/h3&gt;

&lt;/div&gt;

&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/VoiceClone.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FVoiceClone.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Settings -- GPU / VRAM / Model information&lt;/h3&gt;

&lt;/div&gt;

&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/Settings.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FSettings.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Related Projects&lt;/h2&gt;

&lt;/div&gt;

&lt;p&gt;&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;thead&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/thead&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;This&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;…&lt;/p&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;





&lt;p&gt;&lt;strong&gt;Qwen3-TTS-JP&lt;/strong&gt; started as a personal fix — a Japanese-localized fork with Whisper auto-transcription bolted on. But as people started using it, I realized the same pain points existed for developers everywhere. So I expanded it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10-language Web UI&lt;/strong&gt; — Japanese, English, Chinese, Korean, German, French, Russian, Portuguese, Spanish, Italian. The UI auto-detects your browser locale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native Windows support&lt;/strong&gt; — No WSL. No Docker. Just Python + CUDA.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whisper auto-transcription&lt;/strong&gt; — Upload 3 seconds of audio, Whisper handles the rest. Pick from 5 model sizes (tiny → large-v3) depending on your speed/accuracy tradeoff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RTX 5090 (Blackwell) tested&lt;/strong&gt; — I developed this on a Blackwell GPU, so sm_120 architecture is a first-class citizen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mac support&lt;/strong&gt; — Apple Silicon users get a &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-Mac-GeneLab" rel="noopener noreferrer"&gt;dedicated fork&lt;/a&gt; with MLX + PyTorch dual engine and 4bit/8bit quantization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Architecture in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;Qwen3-TTS isn't your typical TTS pipeline. Instead of the usual &lt;code&gt;Text → LM → DiT → Audio&lt;/code&gt; cascade, it uses a discrete multi-codebook LM that goes straight from text to audio codes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional:  Text → Language Model → Intermediate Repr → DiT → Audio
Qwen3-TTS:    Text → Language Model → Audio Codes → Decoder → Audio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bypasses the information bottleneck that makes most TTS systems sound robotic. The result is eerily human-sounding output — with emotion, prosody, and natural pauses all preserved.&lt;/p&gt;

&lt;p&gt;The dual-track streaming architecture means it starts generating audio from the &lt;em&gt;first character&lt;/em&gt; of input. That 97ms first-packet latency is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting It Running (It's Actually Easy Now)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/hiroki-abe-58/Qwen3-TTS-JP.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Qwen3-TTS-JP

python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="c"&gt;# Windows&lt;/span&gt;
.venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate
&lt;span class="c"&gt;# Linux/Mac&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate

pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;faster-whisper

&lt;span class="c"&gt;# RTX 30/40 series&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch torchvision torchaudio &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cu124

&lt;span class="c"&gt;# RTX 50 series (Blackwell) — needs nightly&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--pre&lt;/span&gt; torch torchvision torchaudio &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/nightly/cu128
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Launch the GUI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Voice cloning mode&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; qwen_tts.cli.demo Qwen/Qwen3-TTS-12Hz-1.7B-Base &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--ip&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;--port&lt;/span&gt; 7860 &lt;span class="nt"&gt;--no-flash-attn&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://127.0.0.1:7860&lt;/code&gt;. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Actually Build With This
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting for developers. This isn't just a toy — the Python API is clean enough to integrate into real projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice Cloning in 5 Lines
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qwen_tts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;soundfile&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-TTS-12Hz-1.7B-Base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;wavs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_voice_clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is my cloned voice. It only needed 3 seconds of audio.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;English&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_voice.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# 3 seconds is enough
&lt;/span&gt;    &lt;span class="n"&gt;ref_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, testing.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Whisper can auto-generate this
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wavs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Design a Voice From Scratch
&lt;/h3&gt;

&lt;p&gt;No reference audio needed — just describe what you want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;wavs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_voice_design&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Welcome back, adventurer. Your quest awaits.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;English&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruct&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deep male voice, 45 years old, slight British accent, warm and commanding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cross-Lingual Cloning
&lt;/h3&gt;

&lt;p&gt;Clone a voice in one language, generate speech in another. The model preserves the speaker's timbre across languages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;wavs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_voice_clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bonjour, comment allez-vous aujourd&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hui?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;French&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;english_speaker.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi, this is a test recording.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Practical Use Cases I've Seen
&lt;/h2&gt;

&lt;p&gt;Since releasing this fork, I've seen developers use it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Game dev&lt;/strong&gt; — Generating NPC dialogue dynamically instead of recording thousands of audio files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Podcasting&lt;/strong&gt; — Creating consistent intro/outro narration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility&lt;/strong&gt; — Multilingual audio versions of documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Localization&lt;/strong&gt; — Same voice, 10 languages, zero re-recording&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping&lt;/strong&gt; — Testing voice UX before hiring voice actors&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GPU Compatibility
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Recommended Model&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 5090&lt;/td&gt;
&lt;td&gt;32GB&lt;/td&gt;
&lt;td&gt;1.7B&lt;/td&gt;
&lt;td&gt;Tested &amp;amp; verified&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090&lt;/td&gt;
&lt;td&gt;24GB&lt;/td&gt;
&lt;td&gt;1.7B&lt;/td&gt;
&lt;td&gt;Works great&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4070&lt;/td&gt;
&lt;td&gt;12GB&lt;/td&gt;
&lt;td&gt;0.6B or 1.7B (tight)&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3080&lt;/td&gt;
&lt;td&gt;10GB&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;Works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apple Silicon&lt;/td&gt;
&lt;td&gt;16GB+&lt;/td&gt;
&lt;td&gt;Via Mac fork&lt;/td&gt;
&lt;td&gt;MLX optimized&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're VRAM-constrained, the 0.6B model is surprisingly capable — and FlashAttention 2 can help on Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;flash-attn &lt;span class="nt"&gt;--no-build-isolation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Things I Learned the Hard Way
&lt;/h2&gt;

&lt;p&gt;A few gotchas from building this that might save you time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows cp932 encoding hell.&lt;/strong&gt; Japanese Windows defaults to cp932 encoding, which chokes on Unicode output from the model. The fix is wrapping stdout/stderr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;
&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextIOWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TextIOWrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;FlashAttention 2 doesn't compile on Windows.&lt;/strong&gt; The solution is using PyTorch's built-in SDPA (Scaled Dot Product Attention) via &lt;code&gt;--no-flash-attn&lt;/code&gt;. Performance hit is minimal for single-user inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blackwell (sm_120) needs nightly PyTorch.&lt;/strong&gt; As of early 2026, stable PyTorch doesn't support RTX 50-series. Nightly builds with cu128 work, but you'll see warnings about torchao version mismatches. They're cosmetic — ignore them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SoX is optional.&lt;/strong&gt; The model prints warnings about missing SoX, but it works fine without it. Don't waste time installing it on Windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;I'm currently exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vLLM integration for production-grade serving&lt;/li&gt;
&lt;li&gt;Fine-tuning workflows for custom voice models&lt;/li&gt;
&lt;li&gt;Streaming WebSocket API for real-time applications&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;If you're working on anything voice-related — games, accessibility, content creation, or just want to mess around with state-of-the-art TTS — give it a spin:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows/Linux:&lt;/strong&gt; &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP" rel="noopener noreferrer"&gt;Qwen3-TTS-JP&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Mac (Apple Silicon):&lt;/strong&gt; &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-Mac-GeneLab" rel="noopener noreferrer"&gt;Qwen3-TTS-Mac-GeneLab&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stars are appreciated — they help other developers find the project.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;I'm curious:&lt;/strong&gt; What would you build with 3-second voice cloning? Drop your ideas in the comments — I'd love to hear what use cases I haven't thought of yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ethical Note
&lt;/h2&gt;

&lt;p&gt;Voice cloning is powerful tech. Please use it responsibly — clone only with consent, disclose AI-generated audio, and don't use it for fraud or impersonation. The Apache 2.0 license gives you freedom, but with great power... you know the rest.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>ComfyUI-AceMusic: The First Full Implementation of ACE-Step 1.5 Features That "Weren't Yet Supported"</title>
      <dc:creator>GeneLab_999</dc:creator>
      <pubDate>Wed, 04 Feb 2026 15:06:27 +0000</pubDate>
      <link>https://dev.to/genelab_999/comfyui-acemusic-the-first-full-implementation-of-ace-step-15-features-that-werent-yet-2kje</link>
      <guid>https://dev.to/genelab_999/comfyui-acemusic-the-first-full-implementation-of-ace-step-15-features-that-werent-yet-2kje</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;On February 3rd, 2026, the official ComfyUI blog announced ACE-Step 1.5 support with a notable caveat: &lt;strong&gt;"Cover, Repaint, and other features aren't yet supported in ComfyUI."&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;The next day, I released ComfyUI-AceMusic — a complete implementation of all 15 ACE-Step 1.5 features as ComfyUI nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key highlights:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;World-first&lt;/strong&gt;: Full Cover, Repaint, Edit, Retake, Extend support in ComfyUI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;15 nodes&lt;/strong&gt; covering every ACE-Step 1.5 capability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular architecture&lt;/strong&gt; that eliminates widget ordering issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows + Python 3.13+ compatible&lt;/strong&gt; using soundfile/scipy instead of problematic torchaudio backends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeartMuLa interoperability&lt;/strong&gt; for hybrid AI music workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/hiroki-abe-58/ComfyUI-AceMusic" rel="noopener noreferrer"&gt;github.com/hiroki-abe-58/ComfyUI-AceMusic&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Official Support Was Incomplete
&lt;/h2&gt;

&lt;p&gt;ACE-Step 1.5 is a game-changer for open-source music generation. It outperforms most commercial alternatives, runs on consumer hardware (4GB VRAM), and generates full songs in under 10 seconds on an RTX 3090.&lt;/p&gt;

&lt;p&gt;When ComfyUI announced native support, the community was excited. But there was a catch.&lt;/p&gt;

&lt;p&gt;From the &lt;a href="https://blog.comfy.org/p/ace-step-15-is-now-available-in-comfyui" rel="noopener noreferrer"&gt;official ComfyUI blog&lt;/a&gt; (February 3rd, 2026):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"ACE-Step 1.5 has a few more tricks up its sleeve. &lt;strong&gt;These aren't yet supported in ComfyUI&lt;/strong&gt;, but we have no doubt the community will figure it out."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The "tricks" they mentioned? Only the most powerful features of ACE-Step 1.5:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Official Support&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transform any song into a different style&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repaint&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Regenerate specific sections of audio&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Change tags/lyrics while preserving melody&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retake&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Create variations of existing audio&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Extend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Add new content before/after audio&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So I built them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What ComfyUI-AceMusic Offers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Complete Feature Coverage
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Node&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Loader&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Downloads and caches ACE-Step 1.5 models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Settings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Configure generation parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Generator&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Text-to-Music generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lyrics Input&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dedicated lyrics input with section markers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Caption Input&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Style/genre description input&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transform existing audio into different styles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repaint&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Regenerate specific time ranges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retake&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Create variations with same settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Extend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Add content to beginning or end&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Change tags/lyrics, preserve melody (FlowEdit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Conditioning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Combine parameters into conditioning object&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Generator (from Cond)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generate from conditioning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Load LoRA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Load fine-tuned adapters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Understand&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extract metadata from audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Create Sample&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generate params from natural language&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Comparison with Existing Implementations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;ACE-Step Version&lt;/th&gt;
&lt;th&gt;Cover&lt;/th&gt;
&lt;th&gt;Repaint&lt;/th&gt;
&lt;th&gt;Edit&lt;/th&gt;
&lt;th&gt;Retake&lt;/th&gt;
&lt;th&gt;Extend&lt;/th&gt;
&lt;th&gt;Win 3.13+&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ComfyUI Native&lt;/td&gt;
&lt;td&gt;1.5&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Untested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;billwuhao&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Untested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ryanontheinside&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Untested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ComfyUI-AceMusic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✅&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Technical Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Modular Architecture
&lt;/h3&gt;

&lt;p&gt;Previous implementations crammed 30+ parameters into a single node, causing widget ordering issues — a known ComfyUI quirk where input field order can cause unexpected behavior.&lt;/p&gt;

&lt;p&gt;ComfyUI-AceMusic separates concerns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Model Loader] → Model loading only
[Settings] → Generation parameters only  
[Lyrics Input] → Lyrics entry only
[Caption Input] → Style description only
[Generator] → Generation execution only
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Eliminates widget ordering bugs&lt;/li&gt;
&lt;li&gt;Improves workflow readability&lt;/li&gt;
&lt;li&gt;Makes nodes reusable across different workflows&lt;/li&gt;
&lt;li&gt;Follows single-responsibility principle&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Cross-Platform Compatibility
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: &lt;code&gt;torchaudio&lt;/code&gt; backends can fail on Windows + Python 3.13+.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;: Use &lt;code&gt;soundfile&lt;/code&gt; and &lt;code&gt;scipy&lt;/code&gt; instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Problematic approach
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torchaudio&lt;/span&gt;
&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torchaudio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Fails on Windows 3.13+
&lt;/span&gt;
&lt;span class="c1"&gt;# ComfyUI-AceMusic approach
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;soundfile&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;
&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Works everywhere
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't just a workaround — it's a more robust solution that works across all platforms without requiring specific backend configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. HeartMuLa Interoperability
&lt;/h3&gt;

&lt;p&gt;The AUDIO type in ComfyUI-AceMusic is compatible with HeartMuLa outputs, enabling hybrid workflows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[HeartMuLa Generator] → [AceMusic Cover] → [AceMusic Extend] → [Output]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets you combine the strengths of different music generation models in a single workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Via ComfyUI Manager (Recommended):&lt;/strong&gt;&lt;br&gt;
Search for "ComfyUI-AceMusic" and install.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;ComfyUI/custom_nodes
git clone https://github.com/hiroki-abe-58/ComfyUI-AceMusic.git
&lt;span class="nb"&gt;cd &lt;/span&gt;ComfyUI-AceMusic
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Install ACE-Step 1.5&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;git+https://github.com/ace-step/ACE-Step.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Models auto-download from Hugging Face on first use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Workflow (Text-to-Music)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Add &lt;strong&gt;AceMusic Model Loader&lt;/strong&gt; → set device to &lt;code&gt;cuda&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;AceMusic Settings&lt;/strong&gt; → configure duration, language, etc.&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;AceMusic Lyrics Input&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   [Verse]
   Walking down the empty street
   Thinking about you and me

   [Chorus]
   We belong together
   Now and forever
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add &lt;strong&gt;AceMusic Caption Input&lt;/strong&gt;: &lt;code&gt;pop, female vocal, energetic&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Connect all to &lt;strong&gt;AceMusic Generator&lt;/strong&gt; → &lt;strong&gt;Preview Audio&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Load the example workflow: &lt;code&gt;workflow/AceMusic_Lyrics_v3.json&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cover Workflow (Style Transfer)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Load Audio] ──────────────────┐
                               ↓
[Model Loader] → [Settings] → [AceMusic Cover] → [Preview Audio]
                               ↑
[Caption Input] ───────────────┘
"jazz piano trio, smooth, relaxed"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pop → Jazz arrangement&lt;/li&gt;
&lt;li&gt;Rock → Acoustic version&lt;/li&gt;
&lt;li&gt;EDM → Orchestral arrangement&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Repaint Workflow (Section Regeneration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Load Audio] ──────────────────┐
                               ↓
[Model Loader] → [Settings] → [AceMusic Repaint] → [Preview Audio]
                               ↑
[Time Range: 30-45s] ──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fix a problematic chorus&lt;/li&gt;
&lt;li&gt;Improve the intro&lt;/li&gt;
&lt;li&gt;Regenerate specific vocal sections&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Generation Speed
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Device&lt;/th&gt;
&lt;th&gt;RTF (27 steps)&lt;/th&gt;
&lt;th&gt;Time for 1 min audio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 5090&lt;/td&gt;
&lt;td&gt;~50x&lt;/td&gt;
&lt;td&gt;~1.2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090&lt;/td&gt;
&lt;td&gt;34.48x&lt;/td&gt;
&lt;td&gt;1.74s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A100&lt;/td&gt;
&lt;td&gt;27.27x&lt;/td&gt;
&lt;td&gt;2.20s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3090&lt;/td&gt;
&lt;td&gt;12.76x&lt;/td&gt;
&lt;td&gt;4.70s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;M2 Max&lt;/td&gt;
&lt;td&gt;2.27x&lt;/td&gt;
&lt;td&gt;26.43s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  VRAM Requirements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normal&lt;/td&gt;
&lt;td&gt;8GB+&lt;/td&gt;
&lt;td&gt;Full speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU Offload&lt;/td&gt;
&lt;td&gt;~4GB&lt;/td&gt;
&lt;td&gt;Slower but works on limited VRAM&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CUDA out of memory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Insufficient GPU memory&lt;/td&gt;
&lt;td&gt;Enable &lt;code&gt;cpu_offload&lt;/code&gt; or reduce &lt;code&gt;duration&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ModuleNotFoundError: acestep&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;ACE-Step not installed&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install git+https://github.com/ace-step/ACE-Step.git&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;soundfile not found&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Missing dependency&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install soundfile scipy&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Model download failed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Network issue&lt;/td&gt;
&lt;td&gt;Check Hugging Face access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;torchaudio backend error&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Windows 3.13+ issue&lt;/td&gt;
&lt;td&gt;Ensure soundfile is properly installed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Environment Check Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#!/usr/bin/env python3
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;ComfyUI-AceMusic Environment Checker&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Python version
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version_info&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python 3.10+ required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# PyTorch + CUDA
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ PyTorch: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ CUDA: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;vram&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_device_properties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;total_memory&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ GPU VRAM: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;vram&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; GB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CUDA not available&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ImportError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PyTorch not installed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ACE-Step
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;acestep&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ ACE-Step: installed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ImportError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ACE-Step not installed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Audio libraries
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;soundfile&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ soundfile: installed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ImportError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;soundfile not installed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Results
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;❌ Issues found:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Environment OK!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;When I saw the official announcement saying "these features aren't yet supported," I knew exactly what needed to be done. The ACE-Step team built an incredible model with Cover, Repaint, Edit, and other powerful features — but without ComfyUI support, most users couldn't access them.&lt;/p&gt;

&lt;p&gt;The hardest part was the &lt;code&gt;torchaudio&lt;/code&gt; issue. On Windows with Python 3.13+, the audio backends just don't work reliably. The solution was to bypass torchaudio entirely and use soundfile/scipy for all audio I/O. It's a more robust approach that should work on any platform.&lt;/p&gt;

&lt;p&gt;The modular architecture came from frustration with existing implementations. Stuffing 30+ parameters into one node isn't just ugly — it causes real bugs. Separating concerns made the nodes more reliable and the workflows more readable.&lt;/p&gt;

&lt;p&gt;This is what open source is about. The official team sets the direction, and the community fills in the gaps. I'm proud to contribute to the music generation ecosystem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/hiroki-abe-58/ComfyUI-AceMusic" rel="noopener noreferrer"&gt;github.com/hiroki-abe-58/ComfyUI-AceMusic&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACE-Step 1.5&lt;/strong&gt;: &lt;a href="https://github.com/ace-step/ACE-Step-1.5" rel="noopener noreferrer"&gt;github.com/ace-step/ACE-Step-1.5&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ComfyUI Official Blog&lt;/strong&gt;: &lt;a href="https://blog.comfy.org/p/ace-step-15-is-now-available-in-comfyui" rel="noopener noreferrer"&gt;ACE-Step 1.5 Announcement&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeartMuLa&lt;/strong&gt; (compatible): &lt;a href="https://github.com/filliptm/ComfyUI_FL-HeartMuLa" rel="noopener noreferrer"&gt;github.com/filliptm/ComfyUI_FL-HeartMuLa&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  License
&lt;/h2&gt;

&lt;p&gt;Apache 2.0&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you find this useful, consider starring the repo. And if you build something cool with it, I'd love to see it!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>news</category>
      <category>python</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Run Qwen3-TTS on Windows with RTX 5090: Voice Cloning in 3 Seconds</title>
      <dc:creator>GeneLab_999</dc:creator>
      <pubDate>Sat, 31 Jan 2026 09:37:20 +0000</pubDate>
      <link>https://dev.to/genelab_999/run-qwen3-tts-on-windows-with-rtx-5090-voice-cloning-in-3-seconds-elc</link>
      <guid>https://dev.to/genelab_999/run-qwen3-tts-on-windows-with-rtx-5090-voice-cloning-in-3-seconds-elc</guid>
      <description>&lt;h1&gt;
  
  
  Run Qwen3-TTS on Windows with RTX 5090: The Complete Guide to Voice Cloning in 3 Seconds
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Clone any voice with just 3 seconds of audio — now with native Windows support and the latest Blackwell GPUs&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Qwen3-TTS-JP&lt;/strong&gt; is a fork of Alibaba's Qwen3-TTS that adds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Native Windows support&lt;/strong&gt; (no WSL required!)&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;RTX 5090 / Blackwell GPU tested&lt;/strong&gt; and optimized&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Auto-transcription&lt;/strong&gt; via Whisper integration&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Localized GUI&lt;/strong&gt; (Japanese, easy to adapt)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/hiroki-abe-58" rel="noopener noreferrer"&gt;
        hiroki-abe-58
      &lt;/a&gt; / &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP" rel="noopener noreferrer"&gt;
        Qwen3-TTS-JP
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Japanese GUI + Whisper auto-transcription for Qwen3-TTS. RTX 5090 tested.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Qwen3-TTS-JP&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;English&lt;/strong&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_ja.md" rel="noopener noreferrer"&gt;日本語&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_zh.md" rel="noopener noreferrer"&gt;中文&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_ko.md" rel="noopener noreferrer"&gt;한국어&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_ru.md" rel="noopener noreferrer"&gt;Русский&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_es.md" rel="noopener noreferrer"&gt;Español&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_it.md" rel="noopener noreferrer"&gt;Italiano&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_de.md" rel="noopener noreferrer"&gt;Deutsch&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_fr.md" rel="noopener noreferrer"&gt;Français&lt;/a&gt; | &lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/docs/README_pt.md" rel="noopener noreferrer"&gt;Português&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A &lt;strong&gt;Windows-native&lt;/strong&gt; fork of Qwen3-TTS with a modern, multilingual Web UI.&lt;/p&gt;
&lt;p&gt;The original Qwen3-TTS was developed primarily for Linux environments, and FlashAttention 2 is recommended. However, FlashAttention 2 does not work on Windows. This fork enables &lt;strong&gt;direct execution on Windows without WSL2 or Docker&lt;/strong&gt;, provides a &lt;strong&gt;modern Web UI supporting 10 languages&lt;/strong&gt;, and adds automatic transcription via Whisper.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mac (Apple Silicon) users:&lt;/strong&gt; For the best experience on Mac, please use &lt;strong&gt;&lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-Mac-GeneLab" rel="noopener noreferrer"&gt;Qwen3-TTS-Mac-GeneLab&lt;/a&gt;&lt;/strong&gt; -- fully optimized for Apple Silicon with MLX + PyTorch dual engine, 8bit/4bit quantization, and 10-language Web UI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Custom Voice -- Speech synthesis with preset speakers&lt;/h3&gt;
&lt;/div&gt;
&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/CustomVoice.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FCustomVoice.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Voice Design -- Describe voice characteristics to synthesize&lt;/h3&gt;
&lt;/div&gt;

&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/VoiceDesign.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FVoiceDesign.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Voice Clone -- Clone voice from reference audio&lt;/h3&gt;

&lt;/div&gt;

&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/VoiceClone.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FVoiceClone.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h3 class="heading-element"&gt;Settings -- GPU / VRAM / Model information&lt;/h3&gt;

&lt;/div&gt;

&lt;p&gt;
    &lt;a rel="noopener noreferrer" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP/assets/Settings.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fhiroki-abe-58%2FQwen3-TTS-JP%2Fassets%2FSettings.png" width="90%"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Related Projects&lt;/h2&gt;

&lt;/div&gt;

&lt;p&gt;&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;br&gt;
&lt;thead&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Repository&lt;/th&gt;
&lt;br&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/thead&gt;
&lt;br&gt;
&lt;tbody&gt;
&lt;br&gt;
&lt;tr&gt;
&lt;br&gt;
&lt;td&gt;Windows&lt;/td&gt;
&lt;br&gt;
&lt;td&gt;&lt;strong&gt;This&lt;/strong&gt;&lt;/td&gt;
&lt;br&gt;
&lt;/tr&gt;
&lt;br&gt;
&lt;/tbody&gt;
&lt;br&gt;
&lt;/table&gt;&lt;/div&gt;…&lt;/p&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;








&lt;h2&gt;
  
  
  The Problem: Getting Qwen3-TTS Running on Windows
&lt;/h2&gt;

&lt;p&gt;When Alibaba released Qwen3-TTS in January 2026, the AI community was amazed: &lt;strong&gt;3 seconds of reference audio is all you need to clone a voice&lt;/strong&gt;. Ten languages supported, 97ms latency, emotion control — impressive specs on paper.&lt;/p&gt;

&lt;p&gt;But there was a catch.&lt;/p&gt;

&lt;p&gt;The official repo assumed Linux. CUDA setup was finicky. And if you wanted to use the voice cloning feature, you had to &lt;strong&gt;manually transcribe your reference audio&lt;/strong&gt; — defeating the purpose of a quick workflow.&lt;/p&gt;

&lt;p&gt;I'd just upgraded to an RTX 5090 (Blackwell architecture), eager to push local AI to its limits. After days of wrestling with environments, I got it working and decided to package the solution for everyone else.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes This Fork Different?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Native Windows Support
&lt;/h3&gt;

&lt;p&gt;No WSL. No Docker (though it's optional). Just Python, CUDA, and you're good to go.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/hiroki-abe-58/Qwen3-TTS-JP.git
&lt;span class="nb"&gt;cd &lt;/span&gt;Qwen3-TTS-JP

&lt;span class="c"&gt;# Create virtual environment&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
.venv&lt;span class="se"&gt;\S&lt;/span&gt;cripts&lt;span class="se"&gt;\a&lt;/span&gt;ctivate

&lt;span class="c"&gt;# Install&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;faster-whisper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Works on Windows 10/11 with any CUDA-capable GPU (RTX 30/40/50 series).&lt;/p&gt;

&lt;h3&gt;
  
  
  2. RTX 5090 (Blackwell) Tested
&lt;/h3&gt;

&lt;p&gt;This fork was developed and tested on an RTX 5090. The latest CUDA 12.x with Blackwell architecture can be tricky — many AI repos break on it. This one doesn't.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 5090&lt;/td&gt;
&lt;td&gt;32GB&lt;/td&gt;
&lt;td&gt;1.7B&lt;/td&gt;
&lt;td&gt;✅ Works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090&lt;/td&gt;
&lt;td&gt;24GB&lt;/td&gt;
&lt;td&gt;1.7B&lt;/td&gt;
&lt;td&gt;✅ Works&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 3080&lt;/td&gt;
&lt;td&gt;10GB&lt;/td&gt;
&lt;td&gt;0.6B&lt;/td&gt;
&lt;td&gt;✅ Works&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Whisper Auto-Transcription
&lt;/h3&gt;

&lt;p&gt;The original Qwen3-TTS requires you to provide the transcript of your reference audio. This fork integrates &lt;strong&gt;faster-whisper&lt;/strong&gt; to do it automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload 3 seconds of audio&lt;/li&gt;
&lt;li&gt;Whisper transcribes it&lt;/li&gt;
&lt;li&gt;Qwen3-TTS clones the voice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No manual typing. Choose from 5 Whisper models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;tiny&lt;/td&gt;
&lt;td&gt;39M&lt;/td&gt;
&lt;td&gt;⚡⚡⚡⚡⚡&lt;/td&gt;
&lt;td&gt;★★&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;small&lt;/td&gt;
&lt;td&gt;244M&lt;/td&gt;
&lt;td&gt;⚡⚡⚡&lt;/td&gt;
&lt;td&gt;★★★★&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;large-v3&lt;/td&gt;
&lt;td&gt;1.5B&lt;/td&gt;
&lt;td&gt;⚡&lt;/td&gt;
&lt;td&gt;★★★★★&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Launch the GUI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; qwen_tts.demo Qwen/Qwen3-TTS-12Hz-1.7B-Base &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--ip&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8000&lt;/code&gt; in your browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python API
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;soundfile&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sf&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;qwen_tts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3TTSModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-TTS-12Hz-1.7B-Base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Clone a voice with 3-second reference
&lt;/span&gt;&lt;span class="n"&gt;wavs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_voice_clone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is my cloned voice speaking!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;English&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_audio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reference.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ref_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, this is a test.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;wavs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;sr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Content Creators
&lt;/h3&gt;

&lt;p&gt;Clone your own voice for consistent narration across videos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Game Developers
&lt;/h3&gt;

&lt;p&gt;Create character voices without expensive voice actors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_voice_design&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hero, your quest awaits!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;English&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruct&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deep male voice, 40 years old, British accent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Podcasters
&lt;/h3&gt;

&lt;p&gt;Quick voice-over generation for intros and outros.&lt;/p&gt;




&lt;h2&gt;
  
  
  Supported Languages
&lt;/h2&gt;

&lt;p&gt;🇨🇳 Chinese | 🇺🇸 English | 🇯🇵 Japanese | 🇰🇷 Korean | 🇩🇪 German | 🇫🇷 French | 🇷🇺 Russian | 🇧🇷 Portuguese | 🇪🇸 Spanish | 🇮🇹 Italian&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CUDA out of memory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Use 0.6B model or add FlashAttention 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;faster-whisper not found&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install faster-whisper&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Ethical Note
&lt;/h2&gt;

&lt;p&gt;Voice cloning is powerful. Please:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only clone voices &lt;strong&gt;with consent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Don't use for fraud or misinformation&lt;/li&gt;
&lt;li&gt;Disclose AI-generated audio&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/hiroki-abe-58/Qwen3-TTS-JP" rel="noopener noreferrer"&gt;This Fork&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/QwenLM/Qwen3-TTS" rel="noopener noreferrer"&gt;Original Qwen3-TTS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2601.15621" rel="noopener noreferrer"&gt;Paper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If this helped, please ⭐ the repo!&lt;/p&gt;

&lt;p&gt;Questions? Drop a comment below! 👇&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>tutorial</category>
      <category>python</category>
    </item>
  </channel>
</rss>
