Ever wanted to isolate the vocals from your favorite YouTube music video? Or extract just the drums from a live performance? Maybe create a karaoke track from a song that doesn't have one?
I've been building audio tools for years, and one of the most common requests I get is: "How do I split stems from YouTube videos?"
Here's the complete guide.
What You'll Learn
By the end of this tutorial, you'll be able to:
- Download audio from any YouTube video
- Separate it into vocals, drums, bass, and instruments
- Do it all programmatically with Python
- Understand the legal implications
- Use both free local methods and cloud services
Why This is Useful
Music producers: Sample drums from your favorite tracks
Singers: Create backing tracks for practice
DJs: Make acapellas for mashups
Musicians: Learn songs by isolating instruments
Content creators: Remove copyrighted music from videos
Prerequisites
You'll need:
- Python 3.8 or higher
- Basic command line knowledge
- About 4GB of free disk space
- (Optional) NVIDIA GPU for faster processing
The Two Approaches
Approach 1: Local Processing (Free, Unlimited)
✅ Completely free
✅ Unlimited usage
✅ Full privacy
❌ Requires setup
❌ Needs decent hardware
Approach 2: Cloud Service (Paid, No Setup)
✅ No setup required
✅ Works on any device
✅ Fast processing
❌ Costs money
❌ File size limits
I'll show you both!
Method 1: DIY with Python (Free)
Step 1: Download YouTube Audio
First, we need to download the audio from YouTube.
Install yt-dlp (best YouTube downloader):
pip install yt-dlp
Download audio only:
import subprocess
def download_youtube_audio(url, output_path="audio.mp3"):
"""Download audio from YouTube URL"""
subprocess.run([
'yt-dlp',
'-x', # Extract audio
'--audio-format', 'mp3',
'--audio-quality', '0', # Best quality
'-o', output_path,
url
])
return output_path
# Example usage
youtube_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
audio_file = download_youtube_audio(youtube_url)
print(f"Downloaded: {audio_file}")
Output:
[youtube] Extracting URL: https://www.youtube.com/watch?v=...
[youtube] dQw4w9WgXcQ: Downloading webpage
[download] Destination: audio.mp3
Downloaded: audio.mp3
Step 2: Install Demucs
Demucs is Meta's open-source AI model for stem separation. It's the best available.
pip install demucs
That's it! One command.
Step 3: Separate Stems
Now split the audio into stems:
import subprocess
import os
def separate_stems(audio_file, output_dir="output"):
"""Separate audio into vocals, drums, bass, other"""
subprocess.run([
'demucs',
'-n', 'htdemucs_ft', # Best quality model
'-o', output_dir,
audio_file
])
# Return paths to separated stems
song_name = os.path.splitext(os.path.basename(audio_file))[0]
stems_dir = os.path.join(output_dir, 'htdemucs_ft', song_name)
return {
'vocals': os.path.join(stems_dir, 'vocals.wav'),
'drums': os.path.join(stems_dir, 'drums.wav'),
'bass': os.path.join(stems_dir, 'bass.wav'),
'other': os.path.join(stems_dir, 'other.wav')
}
# Separate the downloaded audio
stems = separate_stems("audio.mp3")
print("Stems extracted:")
for name, path in stems.items():
print(f" {name}: {path}")
Output:
Selected model is a bag of 1 models
Separating track audio.mp3
100%|████████████████| 1/1 [00:42<00:00, 42.18s/it]
Stems extracted:
vocals: output/htdemucs_ft/audio/vocals.wav
drums: output/htdemucs_ft/audio/drums.wav
bass: output/htdemucs_ft/audio/bass.wav
other: output/htdemucs_ft/audio/other.wav
Step 4: Complete Script
Here's the full working script:
#!/usr/bin/env python3
"""
YouTube Stem Splitter
Downloads a YouTube video and separates it into stems
"""
import subprocess
import os
import sys
def download_youtube_audio(url, output_path="audio.mp3"):
"""Download audio from YouTube URL"""
print(f"Downloading from: {url}")
subprocess.run([
'yt-dlp',
'-x',
'--audio-format', 'mp3',
'--audio-quality', '0',
'-o', output_path,
url
], check=True)
return output_path
def separate_stems(audio_file, output_dir="output"):
"""Separate audio into stems using Demucs"""
print(f"Separating stems from: {audio_file}")
subprocess.run([
'demucs',
'-n', 'htdemucs_ft',
'-o', output_dir,
audio_file
], check=True)
song_name = os.path.splitext(os.path.basename(audio_file))[0]
stems_dir = os.path.join(output_dir, 'htdemucs_ft', song_name)
return {
'vocals': os.path.join(stems_dir, 'vocals.wav'),
'drums': os.path.join(stems_dir, 'drums.wav'),
'bass': os.path.join(stems_dir, 'bass.wav'),
'other': os.path.join(stems_dir, 'other.wav')
}
def main():
if len(sys.argv) < 2:
print("Usage: python youtube_stem_splitter.py <youtube_url>")
sys.exit(1)
youtube_url = sys.argv[1]
try:
# Step 1: Download
audio_file = download_youtube_audio(youtube_url)
# Step 2: Separate
stems = separate_stems(audio_file)
# Step 3: Report
print("\n✅ Success! Stems extracted to:")
for name, path in stems.items():
print(f" {name}: {path}")
except subprocess.CalledProcessError as e:
print(f"❌ Error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Usage:
python youtube_stem_splitter.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Advanced Features
Extract Only Vocals (Faster)
For karaoke tracks, you only need vocals separated:
subprocess.run([
'demucs',
'--two-stems=vocals', # Only separate vocals
'-n', 'htdemucs_ft',
'audio.mp3'
])
This is 2x faster since it only creates two files:
-
vocals.wav- isolated vocals -
no_vocals.wav- instrumental (everything else)
GPU Acceleration
If you have an NVIDIA GPU, Demucs auto-detects it and runs 10-50x faster.
Check if GPU is available:
import torch
if torch.cuda.is_available():
print(f"✅ GPU detected: {torch.cuda.get_device_name(0)}")
print(f" Demucs will use GPU automatically")
else:
print("❌ No GPU detected - using CPU (slower)")
Install CUDA support:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Batch Processing
Process multiple YouTube videos:
youtube_urls = [
"https://www.youtube.com/watch?v=...",
"https://www.youtube.com/watch?v=...",
"https://www.youtube.com/watch?v=..."
]
for i, url in enumerate(youtube_urls, 1):
print(f"\n[{i}/{len(youtube_urls)}] Processing: {url}")
audio_file = download_youtube_audio(url, f"audio_{i}.mp3")
stems = separate_stems(audio_file)
print(f"✅ Completed {i}/{len(youtube_urls)}")
Different Output Formats
Save as MP3 instead of WAV (smaller files):
subprocess.run([
'demucs',
'--mp3', # Output as MP3
'--mp3-bitrate', '320', # High quality
'-n', 'htdemucs_ft',
'audio.mp3'
])
Method 2: Using a Service (No Setup)
If you don't want to set up Python and Demucs, several services do this for you.
Using StemSplit API
Advantages:
- No setup required
- Fast processing (30-60 seconds)
- Works from any device
- API for automation
Installation:
pip install requests
Code example:
import requests
def split_youtube_stems(youtube_url, api_key):
"""Use StemSplit API to process YouTube URL"""
response = requests.post(
'https://api.stemsplit.io/v1/youtube',
json={'url': youtube_url},
headers={'Authorization': f'Bearer {api_key}'}
)
result = response.json()
# Download stems
stems = {}
for stem_name, stem_url in result['stems'].items():
stems[stem_name] = requests.get(stem_url).content
return stems
# Usage
api_key = "your_api_key"
youtube_url = "https://www.youtube.com/watch?v=..."
stems = split_youtube_stems(youtube_url, api_key)
# Save stems
for name, data in stems.items():
with open(f'{name}.wav', 'wb') as f:
f.write(data)
Quality Comparison
I tested the same YouTube video with different methods:
| Method | Quality (SDR) | Speed | Cost |
|---|---|---|---|
| Demucs (local, GPU) | 8.4 dB | 35s | Free |
| Demucs (local, CPU) | 8.4 dB | 4m | Free |
| StemSplit API | 8.4 dB | 42s | $0.10 |
| Spleeter (deprecated) | 6.2 dB | 18s | Free |
Verdict: Demucs quality is excellent regardless of method. Choose based on convenience vs cost.
Common Issues & Solutions
Issue 1: "yt-dlp command not found"
Solution: Add Python scripts to PATH or use full path:
python -m yt_dlp -x --audio-format mp3 <url>
Issue 2: "Video unavailable" or "Private video"
Solution: Some videos can't be downloaded:
- Private/unlisted videos
- Age-restricted content
- Region-locked videos
- Live streams (while live)
Try a different video or use the video ID directly.
Issue 3: "CUDA out of memory"
Solution: Reduce segment size:
demucs --segment 10 audio.mp3
Issue 4: Poor quality separation
Causes:
- Very compressed YouTube audio (use videos with "Audio" quality badge)
- Complex production with heavy effects
- Very old recordings
Solutions:
- Download highest quality:
yt-dlp -f bestaudio - Use best Demucs model:
htdemucs_ft - Try different videos/sources
Legal & Ethical Considerations
Is This Legal?
Downloading YouTube videos:
- ❌ Violates YouTube's Terms of Service
- ⚠️ May be illegal depending on your country
- ✅ Legal for videos you own
- ✅ Legal for Creative Commons content
Using the stems:
- ✅ Personal use, learning, practice
- ✅ Educational purposes
- ❌ Commercial use without permission
- ❌ Redistribution of copyrighted stems
- ⚠️ Cover songs (need mechanical license)
Ethical Use Cases
✅ Good:
- Learning to play instruments
- Creating karaoke for personal use
- Studying production techniques
- Academic research
- Practicing singing
❌ Bad:
- Selling stems from copyrighted songs
- Using in commercial productions without license
- Distributing copyrighted acapellas
- Streaming isolated vocals
When in doubt: Only use for personal learning/practice.
Use Case Examples
1. Create Karaoke Tracks
# Extract only instrumentals
subprocess.run([
'demucs',
'--two-stems=vocals',
'song.mp3'
])
# Use the 'no_vocals.wav' file for karaoke
2. Sample Drums for Beats
# Separate stems
stems = separate_stems('drum_break_video.mp3')
# Extract just the drums
drum_file = stems['drums']
# Now process in your DAW or slice for samples
3. Learn Guitar Solos
# Isolate the "other" stem (guitars, keys, etc.)
stems = separate_stems('guitar_lesson.mp3')
guitar_only = stems['other']
# Slow it down and loop in your music player
4. Create Practice Tracks
# Remove your instrument to practice along
# Example: Remove bass to practice bass lines
stems = separate_stems('full_band.mp3')
# Mix everything except bass
from pydub import AudioSegment
vocals = AudioSegment.from_wav(stems['vocals'])
drums = AudioSegment.from_wav(stems['drums'])
other = AudioSegment.from_wav(stems['other'])
# Combine
practice_track = vocals.overlay(drums).overlay(other)
practice_track.export('bass_practice.mp3', format='mp3')
Performance Optimization
Faster Processing
1. Use GPU (10-50x speedup):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
2. Use faster model (quality tradeoff):
demucs -n htdemucs audio.mp3 # Faster than htdemucs_ft
3. Reduce segment size (for large files):
demucs --segment 10 audio.mp3
Memory Optimization
For limited RAM:
subprocess.run([
'demucs',
'--segment', '10', # Process in smaller chunks
'--shifts', '0', # Less accurate but faster
'audio.mp3'
])
Disk Space Management
Clean up after processing:
import shutil
# After getting your stems, remove temporary files
os.remove('audio.mp3') # Remove downloaded audio
shutil.rmtree('output/htdemucs_ft') # Remove intermediate files
Building a Web Interface
Want to make this accessible to non-coders? Here's a simple Flask API:
from flask import Flask, request, send_file
import os
app = Flask(__name__)
@app.route('/split', methods=['POST'])
def split_youtube():
youtube_url = request.json['url']
# Download and split
audio_file = download_youtube_audio(youtube_url)
stems = separate_stems(audio_file)
# Return stems as zip
return {'stems': stems}
if __name__ == '__main__':
app.run(debug=True)
Or use a ready-made solution:
- StemSplit.io - Web interface with YouTube support
Comparison with Other Tools
Demucs vs Spleeter
| Feature | Demucs | Spleeter |
|---|---|---|
| Quality | 8.4 dB | 6.2 dB |
| Maintenance | Active (Meta) | Deprecated |
| Models | Multiple | Limited |
| Speed | Medium | Fast |
| Verdict | ✅ Use this | ❌ Outdated |
Local vs Cloud
| Aspect | Local (Demucs) | Cloud (StemSplit) |
|---|---|---|
| Cost | Free | $0.10/song |
| Setup | Required | None |
| Speed | Depends on hardware | Consistent |
| Privacy | Complete | Data processed on server |
| Limits | None | File size limits |
Choose based on:
- Free + privacy → Local Demucs
- Convenience + reliability → Cloud service
- Heavy usage → Local with GPU
- Occasional use → Cloud service
Next Steps
Now that you can split YouTube stems:
- Experiment with different videos
- Try different Demucs models for quality vs speed
- Build automation for batch processing
- Integrate into your workflow (DAW, sampling, learning)
- Explore other audio AI models (pitch correction, transcription, etc.)
Resources
🎵 Try online: StemSplit.io - No setup required
📚 Demucs setup guide: Complete local installation
🔧 API documentation: Developer docs
📊 Tool comparison: Best vocal removers compared
⚖️ Legal info: Copyright and licensing guide
GitHub Repository
Want the complete code? I've created a repo with:
- Full working script
- Error handling
- Progress bars
- Logging
- Tests
Questions about YouTube stem splitting? Drop them in the comments! 👇
Have improvements for the code? Share them below!
This article was originally published on StemSplit Blog
Top comments (0)