If you're a music producer, audio engineer, or just someone who wants to separate vocals from instrumentals, Meta's Demucs is one of the most powerful AI models available. In this guide, I'll walk you through setting up Demucs locally on your machine.
Why Demucs?
Demucs (Deep Extractor for Music Sources) is a state-of-the-art neural network developed by Meta (formerly Facebook) Research that can separate audio into:
- π€ Vocals
- π₯ Drums
- πΈ Bass
- πΉ Other instruments
Unlike browser-based tools, running Demucs locally gives you:
- β No file size limits
- β No upload time
- β Complete privacy (files never leave your machine)
- β Batch processing capabilities
- β Free unlimited usage
Prerequisites
Before we start, make sure you have:
# Check Python version (need 3.8+)
python --version
# Check if pip is installed
pip --version
You'll need:
- Python 3.8 or higher
- At least 4GB of free disk space
- (Optional) NVIDIA GPU with CUDA for faster processing
Installation Steps
1. Create a Virtual Environment
It's best practice to use a virtual environment to avoid dependency conflicts:
# Create virtual environment
python -m venv demucs-env
# Activate it
# On macOS/Linux:
source demucs-env/bin/activate
# On Windows:
demucs-env\Scripts\activate
2. Install Demucs
pip install demucs
That's it! The package manager handles all dependencies automatically.
3. Verify Installation
demucs --help
You should see the help menu with all available options.
Basic Usage
Separate a Single File
The simplest command:
demucs path/to/your/audio.mp3
Demucs will create a separated folder with 4 stems:
vocals.wavdrums.wavbass.wavother.wav
Use Different Models
Demucs has several models with different quality/speed tradeoffs:
# High-quality model (slower, better results)
demucs --two-stems=vocals -n htdemucs_ft audio.mp3
# Faster model
demucs -n htdemucs audio.mp3
# Best quality (slowest)
demucs -n mdx_extra_q audio.mp3
Extract Only Vocals
If you only need vocals (karaoke):
demucs --two-stems=vocals audio.mp3
This outputs:
-
vocals.wav- isolated vocals -
no_vocals.wav- instrumental (everything else)
Advanced Tips
GPU Acceleration
If you have an NVIDIA GPU with CUDA:
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Then use Demucs normally - it auto-detects GPU
demucs audio.mp3
GPU processing can be 10-50x faster than CPU!
Batch Processing
Process multiple files:
demucs song1.mp3 song2.mp3 song3.mp3
Or use a wildcard:
demucs *.mp3
Custom Output Directory
demucs -o ./output audio.mp3
MP3 Output (Save Disk Space)
By default, Demucs outputs WAV files (large). For MP3:
demucs --mp3 audio.mp3
Comparison: Demucs vs Spleeter
I've tested both extensively. Here's the breakdown:
| Feature | Demucs | Spleeter |
|---|---|---|
| Quality | βββββ | ββββ |
| Speed | Medium | Fast |
| GPU Support | β Yes | β Yes |
| Maintenance | Active | Deprecated |
| Models | Multiple | Limited |
Verdict: Demucs is actively maintained by Meta and produces better quality, especially for vocals. Spleeter is faster but no longer updated.
Common Issues & Solutions
Issue: "CUDA out of memory"
Solution: Reduce the segment size:
demucs --segment 10 audio.mp3
Issue: Poor quality on speech
Solution: Use the fine-tuned model:
demucs -n htdemucs_ft audio.mp3
Issue: Too slow on CPU
Solution: Use the faster model or consider cloud processing:
demucs -n htdemucs audio.mp3
Python API Usage
You can also use Demucs in your own Python scripts:
from demucs import pretrained
from demucs.apply import apply_model
import torchaudio
# Load model
model = pretrained.get_model('htdemucs')
# Load audio
waveform, sr = torchaudio.load('audio.mp3')
# Apply separation
stems = apply_model(model, waveform[None])
# stems contains: [vocals, drums, bass, other]
When to Use Cloud vs Local
Use Local Demucs When:
- Processing sensitive/private audio
- Batch processing many files
- You have a decent GPU
- No internet or slow connection
Use Cloud Services When:
- Quick one-off separations
- No technical setup desired
- Processing on mobile
- Need a user-friendly interface
If you prefer a web interface, I built StemSplit.io which runs Demucs in the cloud with a simple drag-and-drop UI. It also includes features like BPM/key detection and format conversion.
Next Steps
Now that you have Demucs running, you can:
- Build a karaoke maker - Extract instrumentals from songs
- Create sample packs - Isolate drums from your favorite tracks
- Learn music production - Study individual instruments in songs
- Build an API - Wrap Demucs in a Flask/FastAPI service
Check out my other guides:
Resources
- π¦ Demucs GitHub
- π§ StemSplit API Docs
- π¬ Join our community discussions
- π¦ Follow us on Twitter for updates
Have questions about stem separation or AI audio processing? Drop them in the comments below! π
This article was originally published on stemsplit.io/blog
Top comments (0)