StemSplit

Posted on Jan 28

Complete Guide to Setting Up Demucs Locally for AI Stem Separation

#ai #music

If you're a music producer, audio engineer, or just someone who wants to separate vocals from instrumentals, Meta's Demucs is one of the most powerful AI models available. In this guide, I'll walk you through setting up Demucs locally on your machine.

Why Demucs?

Demucs (Deep Extractor for Music Sources) is a state-of-the-art neural network developed by Meta (formerly Facebook) Research that can separate audio into:

🎤 Vocals
🥁 Drums
🎸 Bass
🎹 Other instruments

Unlike browser-based tools, running Demucs locally gives you:

✅ No file size limits
✅ No upload time
✅ Complete privacy (files never leave your machine)
✅ Batch processing capabilities
✅ Free unlimited usage

Prerequisites

Before we start, make sure you have:

# Check Python version (need 3.8+)
python --version

# Check if pip is installed
pip --version

You'll need:

Python 3.8 or higher
At least 4GB of free disk space
(Optional) NVIDIA GPU with CUDA for faster processing

Installation Steps

1. Create a Virtual Environment

It's best practice to use a virtual environment to avoid dependency conflicts:

# Create virtual environment
python -m venv demucs-env

# Activate it
# On macOS/Linux:
source demucs-env/bin/activate
# On Windows:
demucs-env\Scripts\activate

2. Install Demucs

pip install demucs

That's it! The package manager handles all dependencies automatically.

3. Verify Installation

demucs --help

You should see the help menu with all available options.

Basic Usage

Separate a Single File

The simplest command:

demucs path/to/your/audio.mp3

Demucs will create a separated folder with 4 stems:

vocals.wav
drums.wav
bass.wav
other.wav

Use Different Models

Demucs has several models with different quality/speed tradeoffs:

# High-quality model (slower, better results)
demucs --two-stems=vocals -n htdemucs_ft audio.mp3

# Faster model
demucs -n htdemucs audio.mp3

# Best quality (slowest)
demucs -n mdx_extra_q audio.mp3

Extract Only Vocals

If you only need vocals (karaoke):

demucs --two-stems=vocals audio.mp3

This outputs:

vocals.wav - isolated vocals
no_vocals.wav - instrumental (everything else)

Advanced Tips

GPU Acceleration

If you have an NVIDIA GPU with CUDA:

# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then use Demucs normally - it auto-detects GPU
demucs audio.mp3

GPU processing can be 10-50x faster than CPU!

Batch Processing

Process multiple files:

demucs song1.mp3 song2.mp3 song3.mp3

Or use a wildcard:

demucs *.mp3

Custom Output Directory

demucs -o ./output audio.mp3

MP3 Output (Save Disk Space)

By default, Demucs outputs WAV files (large). For MP3:

demucs --mp3 audio.mp3

Comparison: Demucs vs Spleeter

I've tested both extensively. Here's the breakdown:

Feature	Demucs	Spleeter
Quality	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Speed	Medium	Fast
GPU Support	✅ Yes	✅ Yes
Maintenance	Active	Deprecated
Models	Multiple	Limited

Verdict: Demucs is actively maintained by Meta and produces better quality, especially for vocals. Spleeter is faster but no longer updated.

Read full comparison →

Common Issues & Solutions

Issue: "CUDA out of memory"

Solution: Reduce the segment size:

demucs --segment 10 audio.mp3

Issue: Poor quality on speech

Solution: Use the fine-tuned model:

demucs -n htdemucs_ft audio.mp3

Issue: Too slow on CPU

Solution: Use the faster model or consider cloud processing:

demucs -n htdemucs audio.mp3

Python API Usage

You can also use Demucs in your own Python scripts:

from demucs import pretrained
from demucs.apply import apply_model
import torchaudio

# Load model
model = pretrained.get_model('htdemucs')

# Load audio
waveform, sr = torchaudio.load('audio.mp3')

# Apply separation
stems = apply_model(model, waveform[None])

# stems contains: [vocals, drums, bass, other]

When to Use Cloud vs Local

Use Local Demucs When:

Processing sensitive/private audio
Batch processing many files
You have a decent GPU
No internet or slow connection

Use Cloud Services When:

Quick one-off separations
No technical setup desired
Processing on mobile
Need a user-friendly interface

If you prefer a web interface, I built StemSplit.io which runs Demucs in the cloud with a simple drag-and-drop UI. It also includes features like BPM/key detection and format conversion.

Next Steps

Now that you have Demucs running, you can:

Build a karaoke maker - Extract instrumentals from songs
Create sample packs - Isolate drums from your favorite tracks
Learn music production - Study individual instruments in songs
Build an API - Wrap Demucs in a Flask/FastAPI service

Check out my other guides:

Resources

📦 Demucs GitHub
🔧 StemSplit API Docs
💬 Join our community discussions
🐦 Follow us on Twitter for updates

Have questions about stem separation or AI audio processing? Drop them in the comments below! 👇

This article was originally published on stemsplit.io/blog

DEV Community