DEV Community

Cover image for Complete Guide to Setting Up Demucs Locally for AI Stem Separation
StemSplit
StemSplit

Posted on

Complete Guide to Setting Up Demucs Locally for AI Stem Separation

If you're a music producer, audio engineer, or just someone who wants to separate vocals from instrumentals, Meta's Demucs is one of the most powerful AI models available. In this guide, I'll walk you through setting up Demucs locally on your machine.

Why Demucs?

Demucs (Deep Extractor for Music Sources) is a state-of-the-art neural network developed by Meta (formerly Facebook) Research that can separate audio into:

  • 🎀 Vocals
  • πŸ₯ Drums
  • 🎸 Bass
  • 🎹 Other instruments

Unlike browser-based tools, running Demucs locally gives you:

  • βœ… No file size limits
  • βœ… No upload time
  • βœ… Complete privacy (files never leave your machine)
  • βœ… Batch processing capabilities
  • βœ… Free unlimited usage

Prerequisites

Before we start, make sure you have:

# Check Python version (need 3.8+)
python --version

# Check if pip is installed
pip --version
Enter fullscreen mode Exit fullscreen mode

You'll need:

  • Python 3.8 or higher
  • At least 4GB of free disk space
  • (Optional) NVIDIA GPU with CUDA for faster processing

Installation Steps

1. Create a Virtual Environment

It's best practice to use a virtual environment to avoid dependency conflicts:

# Create virtual environment
python -m venv demucs-env

# Activate it
# On macOS/Linux:
source demucs-env/bin/activate
# On Windows:
demucs-env\Scripts\activate
Enter fullscreen mode Exit fullscreen mode

2. Install Demucs

pip install demucs
Enter fullscreen mode Exit fullscreen mode

That's it! The package manager handles all dependencies automatically.

3. Verify Installation

demucs --help
Enter fullscreen mode Exit fullscreen mode

You should see the help menu with all available options.

Basic Usage

Separate a Single File

The simplest command:

demucs path/to/your/audio.mp3
Enter fullscreen mode Exit fullscreen mode

Demucs will create a separated folder with 4 stems:

  • vocals.wav
  • drums.wav
  • bass.wav
  • other.wav

Use Different Models

Demucs has several models with different quality/speed tradeoffs:

# High-quality model (slower, better results)
demucs --two-stems=vocals -n htdemucs_ft audio.mp3

# Faster model
demucs -n htdemucs audio.mp3

# Best quality (slowest)
demucs -n mdx_extra_q audio.mp3
Enter fullscreen mode Exit fullscreen mode

Extract Only Vocals

If you only need vocals (karaoke):

demucs --two-stems=vocals audio.mp3
Enter fullscreen mode Exit fullscreen mode

This outputs:

  • vocals.wav - isolated vocals
  • no_vocals.wav - instrumental (everything else)

Advanced Tips

GPU Acceleration

If you have an NVIDIA GPU with CUDA:

# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then use Demucs normally - it auto-detects GPU
demucs audio.mp3
Enter fullscreen mode Exit fullscreen mode

GPU processing can be 10-50x faster than CPU!

Batch Processing

Process multiple files:

demucs song1.mp3 song2.mp3 song3.mp3
Enter fullscreen mode Exit fullscreen mode

Or use a wildcard:

demucs *.mp3
Enter fullscreen mode Exit fullscreen mode

Custom Output Directory

demucs -o ./output audio.mp3
Enter fullscreen mode Exit fullscreen mode

MP3 Output (Save Disk Space)

By default, Demucs outputs WAV files (large). For MP3:

demucs --mp3 audio.mp3
Enter fullscreen mode Exit fullscreen mode

Comparison: Demucs vs Spleeter

I've tested both extensively. Here's the breakdown:

Feature Demucs Spleeter
Quality ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
Speed Medium Fast
GPU Support βœ… Yes βœ… Yes
Maintenance Active Deprecated
Models Multiple Limited

Verdict: Demucs is actively maintained by Meta and produces better quality, especially for vocals. Spleeter is faster but no longer updated.

Read full comparison β†’

Common Issues & Solutions

Issue: "CUDA out of memory"

Solution: Reduce the segment size:

demucs --segment 10 audio.mp3
Enter fullscreen mode Exit fullscreen mode

Issue: Poor quality on speech

Solution: Use the fine-tuned model:

demucs -n htdemucs_ft audio.mp3
Enter fullscreen mode Exit fullscreen mode

Issue: Too slow on CPU

Solution: Use the faster model or consider cloud processing:

demucs -n htdemucs audio.mp3
Enter fullscreen mode Exit fullscreen mode

Python API Usage

You can also use Demucs in your own Python scripts:

from demucs import pretrained
from demucs.apply import apply_model
import torchaudio

# Load model
model = pretrained.get_model('htdemucs')

# Load audio
waveform, sr = torchaudio.load('audio.mp3')

# Apply separation
stems = apply_model(model, waveform[None])

# stems contains: [vocals, drums, bass, other]
Enter fullscreen mode Exit fullscreen mode

When to Use Cloud vs Local

Use Local Demucs When:

  • Processing sensitive/private audio
  • Batch processing many files
  • You have a decent GPU
  • No internet or slow connection

Use Cloud Services When:

  • Quick one-off separations
  • No technical setup desired
  • Processing on mobile
  • Need a user-friendly interface

If you prefer a web interface, I built StemSplit.io which runs Demucs in the cloud with a simple drag-and-drop UI. It also includes features like BPM/key detection and format conversion.

Next Steps

Now that you have Demucs running, you can:

  1. Build a karaoke maker - Extract instrumentals from songs
  2. Create sample packs - Isolate drums from your favorite tracks
  3. Learn music production - Study individual instruments in songs
  4. Build an API - Wrap Demucs in a Flask/FastAPI service

Check out my other guides:

Resources


Have questions about stem separation or AI audio processing? Drop them in the comments below! πŸ‘‡

This article was originally published on stemsplit.io/blog

Top comments (0)