DEV Community

Cover image for How I Run 6 AI Services Simultaneously on RTX 5090 + WSL2 + Docker (And You Can Too)
GeneLab_999
GeneLab_999

Posted on

How I Run 6 AI Services Simultaneously on RTX 5090 + WSL2 + Docker (And You Can Too)

TL;DR:

I built a multi-service local AI stack (image gen, video gen, voice synthesis, voice cloning) running on RTX 5090 via WSL2 Docker. The key breakthrough was solving the GPU driver passthrough layer that nobody documented. Here's the architecture, the critical gpu-run function, and everything I learned the hard way.


The Problem Nobody Solved

In August 2025, I bought an RTX 5090. Blackwell architecture. 32GB GDDR7. Compute capability sm_120.

And nobody could make it work with WSL2 + Docker + PyTorch.

The issue wasn't any single component. nvidia-smi worked fine in containers. libcuda.so.1 loaded correctly. But PyTorch kept returning torch.cuda.is_available() = False with a cryptic Error 500: named symbol not found.

I spent roughly 40 hours debugging. Here's what I found, and how I turned it into a production multi-service AI environment.


The Root Cause

The failure point was in the interaction layer between WSL2's driver mounting and Docker's GPU runtime.

When you run --gpus all in a Docker container on WSL2, the NVIDIA Container Toolkit mounts /usr/lib/wsl/lib into the container. This directory contains libcuda.so.1 and friends. For most GPUs, this is enough.

For the RTX 5090, it's not.

The actual driver binaries live in a separate directory: /usr/lib/wsl/drivers/nvmdi.inf_amd64_<hash>. This directory contains the real libcuda.so.1.1, libnvdxgdmal.so.1, libnvidia-ptxjitcompiler.so.1, and other dependencies that the PyTorch CUDA runtime needs to initialize the Blackwell architecture.

Without mounting this directory AND setting LD_LIBRARY_PATH to include it, PyTorch's CUDA initialization hits a dead end -- it finds libcuda.so.1 but can't resolve the sm_120-specific symbols.


The Solution: gpu-run

Here's the function that makes everything work:

gpu-run () {
  local D BN
  D=$(ls -d /usr/lib/wsl/drivers/nvmdi.inf_amd64_* | head -n1) || return 1
  BN=$(basename "$D")
  echo "Using driver path: $D"
  docker run --rm --gpus all \
    -v /usr/lib/wsl/lib:/usr/lib/wsl/lib:ro \
    -v "$D":/usr/lib/wsl/drivers/"$BN":ro \
    -e LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib/wsl/drivers/"$BN" \
    "$@"
}
Enter fullscreen mode Exit fullscreen mode

What this does:

  1. Finds the driver directory dynamically -- the hash suffix changes with driver updates
  2. Mounts both WSL lib paths -- the standard /usr/lib/wsl/lib AND the driver-specific directory
  3. Sets LD_LIBRARY_PATH to prioritize these paths for symbol resolution

Verification:

source gpu-run.sh
gpu-run torch-wsl-cu128 python3 -c "
import torch
print('PyTorch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('VRAM:', torch.cuda.get_device_properties(0).total_mem // 1024**3, 'GB')
"
Enter fullscreen mode Exit fullscreen mode

Output:

Using driver path: /usr/lib/wsl/drivers/nvmdi.inf_amd64_fb80e95fa979ce23
PyTorch: 2.9.0.dev20250812+cu128
CUDA available: True
GPU: NVIDIA GeForce RTX 5090
VRAM: 32 GB
Enter fullscreen mode Exit fullscreen mode

The Dockerfile Template

Every AI service in my stack uses a variation of this base:

FROM nvidia/cuda:12.8.0-devel-ubuntu22.04

ENV TZ=Asia/Tokyo
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV CUDA_HOME=/usr/local/cuda

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 python3-pip python3-dev git ffmpeg ca-certificates \
    build-essential cmake ninja-build libsndfile1 \
    && rm -rf /var/lib/apt/lists/*

RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir numpy==1.26.4

RUN pip3 install --no-cache-dir --pre \
    torch torchvision torchaudio \
    --index-url https://download.pytorch.org/whl/nightly/cu128

RUN python3 -c "import torch; print('PyTorch:', torch.__version__); assert 'cu128' in torch.__version__"

WORKDIR /app
Enter fullscreen mode Exit fullscreen mode

Key decisions:

  • nvidia/cuda:12.8.0-devel-ubuntu22.04 -- CUDA 12.8 is the minimum for sm_120. Using devel (not runtime) because some AI frameworks compile CUDA extensions at build time.
  • PyTorch nightly cu128 -- as of early 2026, stable PyTorch still has incomplete Blackwell support. Nightly cu128 is non-negotiable.
  • numpy pinned to 1.26.4 -- numpy 2.x breaks several AI frameworks that haven't updated their C extensions.
  • Install torch LAST -- many requirements.txt files include torch. If you install dependencies first, they'll pull in a stable torch that doesn't support sm_120. Always install your carefully selected torch version as the final step.

Docker Compose Architecture

Here's how six AI services coexist in a single compose.yaml:

services:
  comfyui:
    build:
      context: ./apps/comfyui
      dockerfile: Dockerfile
    image: comfyui:wsl-cu12
    profiles: ["comfyui", "all"]
    runtime: nvidia
    environment:
      - LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib/wsl/drivers/${WSL_DRV_BN}
      - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:512
      - CUDA_VISIBLE_DEVICES=0
    volumes:
      - /usr/lib/wsl/lib:/usr/lib/wsl/lib:ro
      - ${WSL_DRV_DIR}:/usr/lib/wsl/drivers/${WSL_DRV_BN}:ro
      - ./data/comfyui-models:/app/models
      - ./shared/models:/shared/models:ro
    ports:
      - "8188:8188"
    ipc: host
    ulimits:
      memlock: -1
      stack: 67108864

  sbv2:
    build:
      context: ./apps/sbv2
      dockerfile: Dockerfile
    image: sbv2:wsl-cu12
    profiles: ["sbv2", "all"]
    runtime: nvidia
    environment:
      - LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/lib/wsl/drivers/${WSL_DRV_BN}
      - PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:512
    volumes:
      - /usr/lib/wsl/lib:/usr/lib/wsl/lib:ro
      - ${WSL_DRV_DIR}:/usr/lib/wsl/drivers/${WSL_DRV_BN}:ro
      - ./data/sbv2-models:/opt/models
    ports:
      - "5000:5000"
    ipc: host
    ulimits:
      memlock: -1
      stack: 67108864

  cosyvoice:
    profiles: ["cosyvoice", "all"]
    ports:
      - "7865:7865"

  rvc:
    profiles: ["rvc", "all"]
    ports:
      - "7866:7866"

  framepack:
    profiles: ["framepack", "all"]
    ports:
      - "7862:7862"
Enter fullscreen mode Exit fullscreen mode

(Each service follows the same WSL driver mount pattern -- I've abbreviated the later ones for readability.)

The .env file is auto-generated:

WSL_DRV_DIR=$(ls -d /usr/lib/wsl/drivers/nvmdi.inf_amd64_* | head -n1)
WSL_DRV_BN=$(basename "$WSL_DRV_DIR")
cat > .env << EOF
WSL_DRV_DIR=$WSL_DRV_DIR
WSL_DRV_BN=$WSL_DRV_BN
EOF
Enter fullscreen mode Exit fullscreen mode

Design Decisions That Saved My Sanity

1. Docker Profiles for Resource Isolation

With 32GB VRAM, you can't run everything simultaneously. Video generation alone can eat 24GB. Docker profiles let me spin up exactly what I need:

docker compose --profile comfyui up -d
docker compose --profile sbv2 --profile cosyvoice up -d
docker compose --profile all up -d
Enter fullscreen mode Exit fullscreen mode

2. Shared Model Directory

AI models are enormous. Flux checkpoints, HunyuanVideo weights, voice models -- easily 200GB+. Instead of duplicating them per container:

~/ai-workspace-correct/
  shared/
    models/           # Cross-service shared models
    hf_cache/         # HuggingFace cache (persistent)
  data/
    comfyui-models/   # Service-specific models
    sbv2-models/
    cosyvoice-models/
Enter fullscreen mode Exit fullscreen mode

Each service mounts shared/models read-only. Service-specific models go in their own data/ directory.

3. Port Allocation Strategy

I carved out port ranges by domain:

Range Domain Services
5000-5009 Voice synthesis Style-BERT-VITS2
7860-7869 Voice/Video AI FramePack, CosyVoice, RVC
8180-8189 Image AI ComfyUI

This avoids collisions and makes firewall rules predictable.

4. The torchaudio Trap

This one cost me hours. Several voice synthesis frameworks use torchaudio.info() and torchaudio.load(). The nightly cu128 build of torchaudio has breaking API changes. The fix:

import soundfile as sf

sample_rate = sf.info(wav_path).samplerate
audio_data, sr = sf.read(wav_path)
Enter fullscreen mode Exit fullscreen mode

I patch these at Docker build time with sed:

RUN sed -i 's/import torchaudio/import torchaudio\nimport soundfile as sf/' /opt/app/webui.py && \
    sed -i 's/torchaudio.info(prompt_wav).sample_rate/sf.info(prompt_wav).samplerate/g' /opt/app/webui.py
Enter fullscreen mode Exit fullscreen mode

Lessons Learned (The Hard Way)

1. Never let requirements.txt install torch.
Strip torch, torchvision, torchaudio from every requirements.txt before installing. Then install your nightly cu128 build as the final step. If you don't, pip will happily overwrite your working torch with a stable version that can't see your GPU.

2. Driver updates break the hash.
The nvmdi.inf_amd64_<hash> directory changes when you update NVIDIA drivers. The gpu-run function handles this with dynamic lookup. But if you hardcode the path anywhere, you'll have a bad time.

3. ipc: host is non-negotiable for AI workloads.
Without it, PyTorch's shared memory operations fail silently or with cryptic errors. Always set it.

4. PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
This environment variable enables PyTorch's memory-efficient allocation strategy. Without it on 32GB VRAM, you'll hit fragmentation issues on large models that shouldn't theoretically run out of memory.

5. Document everything as if you'll have amnesia tomorrow.
I wrote my setup docs with the goal of "restore everything from scratch in 30 minutes." That document has saved me three times already.


Current Stack (February 2026)

Service Purpose Port Status
ComfyUI Image generation (Flux, SDXL) 8188 Stable
Style-BERT-VITS2 Japanese TTS voice synthesis 5000 Stable
CosyVoice Multi-speaker voice synthesis 7865 Stable
RVC Real-time voice conversion 7866 Stable
FramePack Video generation (HunyuanVideo) 7862 Stable

All running on:

  • GPU: RTX 5090 32GB GDDR7
  • CPU: Intel Core Ultra 9 285K
  • RAM: 64GB DDR5
  • OS: Windows 11 Pro + WSL2 Ubuntu 22.04
  • Container runtime: Docker with NVIDIA Container Toolkit

Is This Still Unique?

As of February 2026, there are published examples of single-service RTX 5090 + Docker setups (vLLM, ComfyUI, basic PyTorch). What I haven't found elsewhere is:

  • A multi-service Docker Compose stack orchestrating 5+ AI services on Blackwell
  • The specific WSL2 driver mount solution documented with the nvmdi.inf_amd64_* path
  • A systematic approach to dependency isolation across services sharing one GPU
  • Production-grade patterns for model sharing, port management, and environment recovery

If you've done something similar, I'd genuinely love to hear about it. Drop a comment or reach out.


Built with ~40 hours of debugging, 200+ GB of model files, and an unreasonable amount of stubbornness. Based in Tokyo.

#rtx5090 #docker #wsl2 #pytorch #cuda #blackwell #ai #selfhosted

Top comments (0)