StemSplit

Posted on Jan 29

How to Extract Stems from YouTube Videos Using Python (Free & Easy)

#ai #music #podcast

Ever wanted to isolate the vocals from your favorite YouTube music video? Or extract just the drums from a live performance? Maybe create a karaoke track from a song that doesn't have one?

I've been building audio tools for years, and one of the most common requests I get is: "How do I split stems from YouTube videos?"

Here's the complete guide.

What You'll Learn

By the end of this tutorial, you'll be able to:

Download audio from any YouTube video
Separate it into vocals, drums, bass, and instruments
Do it all programmatically with Python
Understand the legal implications
Use both free local methods and cloud services

Why This is Useful

Music producers: Sample drums from your favorite tracks

Singers: Create backing tracks for practice

DJs: Make acapellas for mashups

Musicians: Learn songs by isolating instruments

Content creators: Remove copyrighted music from videos

Prerequisites

You'll need:

Python 3.8 or higher
Basic command line knowledge
About 4GB of free disk space
(Optional) NVIDIA GPU for faster processing

The Two Approaches

Approach 1: Local Processing (Free, Unlimited)

✅ Completely free

✅ Unlimited usage

✅ Full privacy

❌ Requires setup

❌ Needs decent hardware

Approach 2: Cloud Service (Paid, No Setup)

✅ No setup required

✅ Works on any device

✅ Fast processing

❌ Costs money

❌ File size limits

I'll show you both!

Method 1: DIY with Python (Free)

Step 1: Download YouTube Audio

First, we need to download the audio from YouTube.

Install yt-dlp (best YouTube downloader):

pip install yt-dlp

Download audio only:

import subprocess

def download_youtube_audio(url, output_path="audio.mp3"):
    """Download audio from YouTube URL"""
    subprocess.run([
        'yt-dlp',
        '-x',  # Extract audio
        '--audio-format', 'mp3',
        '--audio-quality', '0',  # Best quality
        '-o', output_path,
        url
    ])
    return output_path

# Example usage
youtube_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
audio_file = download_youtube_audio(youtube_url)
print(f"Downloaded: {audio_file}")

Output:

[youtube] Extracting URL: https://www.youtube.com/watch?v=...
[youtube] dQw4w9WgXcQ: Downloading webpage
[download] Destination: audio.mp3
Downloaded: audio.mp3

Step 2: Install Demucs

Demucs is Meta's open-source AI model for stem separation. It's the best available.

pip install demucs

That's it! One command.

Step 3: Separate Stems

Now split the audio into stems:

import subprocess
import os

def separate_stems(audio_file, output_dir="output"):
    """Separate audio into vocals, drums, bass, other"""
    subprocess.run([
        'demucs',
        '-n', 'htdemucs_ft',  # Best quality model
        '-o', output_dir,
        audio_file
    ])

    # Return paths to separated stems
    song_name = os.path.splitext(os.path.basename(audio_file))[0]
    stems_dir = os.path.join(output_dir, 'htdemucs_ft', song_name)

    return {
        'vocals': os.path.join(stems_dir, 'vocals.wav'),
        'drums': os.path.join(stems_dir, 'drums.wav'),
        'bass': os.path.join(stems_dir, 'bass.wav'),
        'other': os.path.join(stems_dir, 'other.wav')
    }

# Separate the downloaded audio
stems = separate_stems("audio.mp3")

print("Stems extracted:")
for name, path in stems.items():
    print(f"  {name}: {path}")

Output:

Selected model is a bag of 1 models
Separating track audio.mp3
100%|████████████████| 1/1 [00:42<00:00, 42.18s/it]

Stems extracted:
  vocals: output/htdemucs_ft/audio/vocals.wav
  drums: output/htdemucs_ft/audio/drums.wav
  bass: output/htdemucs_ft/audio/bass.wav
  other: output/htdemucs_ft/audio/other.wav

Step 4: Complete Script

Here's the full working script:

#!/usr/bin/env python3
"""
YouTube Stem Splitter
Downloads a YouTube video and separates it into stems
"""

import subprocess
import os
import sys

def download_youtube_audio(url, output_path="audio.mp3"):
    """Download audio from YouTube URL"""
    print(f"Downloading from: {url}")
    subprocess.run([
        'yt-dlp',
        '-x',
        '--audio-format', 'mp3',
        '--audio-quality', '0',
        '-o', output_path,
        url
    ], check=True)
    return output_path

def separate_stems(audio_file, output_dir="output"):
    """Separate audio into stems using Demucs"""
    print(f"Separating stems from: {audio_file}")
    subprocess.run([
        'demucs',
        '-n', 'htdemucs_ft',
        '-o', output_dir,
        audio_file
    ], check=True)

    song_name = os.path.splitext(os.path.basename(audio_file))[0]
    stems_dir = os.path.join(output_dir, 'htdemucs_ft', song_name)

    return {
        'vocals': os.path.join(stems_dir, 'vocals.wav'),
        'drums': os.path.join(stems_dir, 'drums.wav'),
        'bass': os.path.join(stems_dir, 'bass.wav'),
        'other': os.path.join(stems_dir, 'other.wav')
    }

def main():
    if len(sys.argv) < 2:
        print("Usage: python youtube_stem_splitter.py <youtube_url>")
        sys.exit(1)

    youtube_url = sys.argv[1]

    try:
        # Step 1: Download
        audio_file = download_youtube_audio(youtube_url)

        # Step 2: Separate
        stems = separate_stems(audio_file)

        # Step 3: Report
        print("\n✅ Success! Stems extracted to:")
        for name, path in stems.items():
            print(f"  {name}: {path}")

    except subprocess.CalledProcessError as e:
        print(f"❌ Error: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Usage:

python youtube_stem_splitter.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

Advanced Features

Extract Only Vocals (Faster)

For karaoke tracks, you only need vocals separated:

subprocess.run([
    'demucs',
    '--two-stems=vocals',  # Only separate vocals
    '-n', 'htdemucs_ft',
    'audio.mp3'
])

This is 2x faster since it only creates two files:

vocals.wav - isolated vocals
no_vocals.wav - instrumental (everything else)

GPU Acceleration

If you have an NVIDIA GPU, Demucs auto-detects it and runs 10-50x faster.

Check if GPU is available:

import torch

if torch.cuda.is_available():
    print(f"✅ GPU detected: {torch.cuda.get_device_name(0)}")
    print(f"   Demucs will use GPU automatically")
else:
    print("❌ No GPU detected - using CPU (slower)")

Install CUDA support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Batch Processing

Process multiple YouTube videos:

youtube_urls = [
    "https://www.youtube.com/watch?v=...",
    "https://www.youtube.com/watch?v=...",
    "https://www.youtube.com/watch?v=..."
]

for i, url in enumerate(youtube_urls, 1):
    print(f"\n[{i}/{len(youtube_urls)}] Processing: {url}")
    audio_file = download_youtube_audio(url, f"audio_{i}.mp3")
    stems = separate_stems(audio_file)
    print(f"✅ Completed {i}/{len(youtube_urls)}")

Different Output Formats

Save as MP3 instead of WAV (smaller files):

subprocess.run([
    'demucs',
    '--mp3',  # Output as MP3
    '--mp3-bitrate', '320',  # High quality
    '-n', 'htdemucs_ft',
    'audio.mp3'
])

Method 2: Using a Service (No Setup)

If you don't want to set up Python and Demucs, several services do this for you.

Using StemSplit API

Advantages:

No setup required
Fast processing (30-60 seconds)
Works from any device
API for automation

Installation:

pip install requests

Code example:

import requests

def split_youtube_stems(youtube_url, api_key):
    """Use StemSplit API to process YouTube URL"""
    response = requests.post(
        'https://api.stemsplit.io/v1/youtube',
        json={'url': youtube_url},
        headers={'Authorization': f'Bearer {api_key}'}
    )

    result = response.json()

    # Download stems
    stems = {}
    for stem_name, stem_url in result['stems'].items():
        stems[stem_name] = requests.get(stem_url).content

    return stems

# Usage
api_key = "your_api_key"
youtube_url = "https://www.youtube.com/watch?v=..."

stems = split_youtube_stems(youtube_url, api_key)

# Save stems
for name, data in stems.items():
    with open(f'{name}.wav', 'wb') as f:
        f.write(data)

Get API access →

Quality Comparison

I tested the same YouTube video with different methods:

Method	Quality (SDR)	Speed	Cost
Demucs (local, GPU)	8.4 dB	35s	Free
Demucs (local, CPU)	8.4 dB	4m	Free
StemSplit API	8.4 dB	42s	$0.10
Spleeter (deprecated)	6.2 dB	18s	Free

Verdict: Demucs quality is excellent regardless of method. Choose based on convenience vs cost.

Common Issues & Solutions

Issue 1: "yt-dlp command not found"

Solution: Add Python scripts to PATH or use full path:

python -m yt_dlp -x --audio-format mp3 <url>

Issue 2: "Video unavailable" or "Private video"

Solution: Some videos can't be downloaded:

Private/unlisted videos
Age-restricted content
Region-locked videos
Live streams (while live)

Try a different video or use the video ID directly.

Issue 3: "CUDA out of memory"

Solution: Reduce segment size:

demucs --segment 10 audio.mp3

Issue 4: Poor quality separation

Causes:

Very compressed YouTube audio (use videos with "Audio" quality badge)
Complex production with heavy effects
Very old recordings

Solutions:

Download highest quality: yt-dlp -f bestaudio
Use best Demucs model: htdemucs_ft
Try different videos/sources

Legal & Ethical Considerations

Is This Legal?

Downloading YouTube videos:

❌ Violates YouTube's Terms of Service
⚠️ May be illegal depending on your country
✅ Legal for videos you own
✅ Legal for Creative Commons content

Using the stems:

✅ Personal use, learning, practice
✅ Educational purposes
❌ Commercial use without permission
❌ Redistribution of copyrighted stems
⚠️ Cover songs (need mechanical license)

Ethical Use Cases

✅ Good:

Learning to play instruments
Creating karaoke for personal use
Studying production techniques
Academic research
Practicing singing

❌ Bad:

Selling stems from copyrighted songs
Using in commercial productions without license
Distributing copyrighted acapellas
Streaming isolated vocals

When in doubt: Only use for personal learning/practice.

Complete legal guide →

Use Case Examples

1. Create Karaoke Tracks

# Extract only instrumentals
subprocess.run([
    'demucs',
    '--two-stems=vocals',
    'song.mp3'
])
# Use the 'no_vocals.wav' file for karaoke

2. Sample Drums for Beats

# Separate stems
stems = separate_stems('drum_break_video.mp3')

# Extract just the drums
drum_file = stems['drums']

# Now process in your DAW or slice for samples

3. Learn Guitar Solos

# Isolate the "other" stem (guitars, keys, etc.)
stems = separate_stems('guitar_lesson.mp3')
guitar_only = stems['other']

# Slow it down and loop in your music player

4. Create Practice Tracks

# Remove your instrument to practice along
# Example: Remove bass to practice bass lines
stems = separate_stems('full_band.mp3')

# Mix everything except bass
from pydub import AudioSegment

vocals = AudioSegment.from_wav(stems['vocals'])
drums = AudioSegment.from_wav(stems['drums'])
other = AudioSegment.from_wav(stems['other'])

# Combine
practice_track = vocals.overlay(drums).overlay(other)
practice_track.export('bass_practice.mp3', format='mp3')

Performance Optimization

Faster Processing

1. Use GPU (10-50x speedup):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

2. Use faster model (quality tradeoff):

demucs -n htdemucs audio.mp3  # Faster than htdemucs_ft

3. Reduce segment size (for large files):

demucs --segment 10 audio.mp3

Memory Optimization

For limited RAM:

subprocess.run([
    'demucs',
    '--segment', '10',  # Process in smaller chunks
    '--shifts', '0',    # Less accurate but faster
    'audio.mp3'
])

Disk Space Management

Clean up after processing:

import shutil

# After getting your stems, remove temporary files
os.remove('audio.mp3')  # Remove downloaded audio
shutil.rmtree('output/htdemucs_ft')  # Remove intermediate files

Building a Web Interface

Want to make this accessible to non-coders? Here's a simple Flask API:

from flask import Flask, request, send_file
import os

app = Flask(__name__)

@app.route('/split', methods=['POST'])
def split_youtube():
    youtube_url = request.json['url']

    # Download and split
    audio_file = download_youtube_audio(youtube_url)
    stems = separate_stems(audio_file)

    # Return stems as zip
    return {'stems': stems}

if __name__ == '__main__':
    app.run(debug=True)

Or use a ready-made solution:

StemSplit.io - Web interface with YouTube support

Comparison with Other Tools

Demucs vs Spleeter

Feature	Demucs	Spleeter
Quality	8.4 dB	6.2 dB
Maintenance	Active (Meta)	Deprecated
Models	Multiple	Limited
Speed	Medium	Fast
Verdict	✅ Use this	❌ Outdated

Full comparison →

Local vs Cloud

Aspect	Local (Demucs)	Cloud (StemSplit)
Cost	Free	$0.10/song
Setup	Required	None
Speed	Depends on hardware	Consistent
Privacy	Complete	Data processed on server
Limits	None	File size limits

Choose based on:

Free + privacy → Local Demucs
Convenience + reliability → Cloud service
Heavy usage → Local with GPU
Occasional use → Cloud service

Next Steps

Now that you can split YouTube stems:

Experiment with different videos
Try different Demucs models for quality vs speed
Build automation for batch processing
Integrate into your workflow (DAW, sampling, learning)
Explore other audio AI models (pitch correction, transcription, etc.)

Resources

🎵 Try online: StemSplit.io - No setup required

📚 Demucs setup guide: Complete local installation

🔧 API documentation: Developer docs

📊 Tool comparison: Best vocal removers compared

⚖️ Legal info: Copyright and licensing guide

GitHub Repository

Want the complete code? I've created a repo with:

Full working script
Error handling
Progress bars
Logging
Tests

Questions about YouTube stem splitting? Drop them in the comments! 👇

Have improvements for the code? Share them below!

This article was originally published on StemSplit Blog