Discord has 500 million users. Most bots stream music from YouTube. This one splits it into stems — vocals, drums, bass, and instrumentals — and drops download links straight into your DMs.
Here's the full walkthrough: slash command, file upload, async API job, polling loop, and a result embed that would make your server's musicians happy.
What You'll Learn
- ✅ How to create a
/splitslash command with discord.py 2.xapp_commands - ✅ How to accept audio file uploads through Discord interactions
- ✅ How to call the StemSplit REST API and poll for async job completion
- ✅ How to DM the user individual stem download links when the job finishes
- ✅ How to handle file size limits, API errors, and Discord's 3-second interaction timeout
- ✅ A working
docker-compose.ymlto deploy it anywhere
Prerequisites
pip install "discord.py>=2.4" requests python-dotenv
You need three things:
A Discord bot token — create a bot at discord.com/developers/applications, enable the
MESSAGE CONTENTintent andbotscope withSend Messages,Read Message History,Attach Files, andUse Application Commandspermissions.A StemSplit API key — grab one from the free tier at StemSplit's developer portal (5 minutes of processing, no card). The same key powers this bot and any other integration you build.
A
.envfile in your project root:
DISCORD_TOKEN=your_discord_bot_token_here
STEMSPLIT_API_KEY=your_stemsplit_api_key_here
How the Bot Works
The flow is simple:
User runs /split → uploads audio file
→ Bot defers the interaction (buys 15 min instead of 3 seconds)
→ POSTs file to StemSplit API → gets job_id
→ Polls GET /jobs/{job_id} every 5s
→ Job completes → bot DMs user an embed with 4 download links
→ Bot replies publicly: "Done! Check your DMs."
StemSplit runs HTDemucs on GPU — a 3-minute track takes roughly 35–45 seconds. HTDemucs is the current best open-source model for stem separation, so quality is on par with what you'd get running Demucs locally, without the 4 GB model download.
Setting Up the Discord Application
In the Discord Developer Portal:
- Create a new application → add a Bot
- Under Privileged Gateway Intents, enable
Message Content Intent - Under OAuth2 → URL Generator, select scopes:
bot+applications.commands - Add permissions:
Send Messages,Attach Files,Use Application Commands,Send Messages in Threads - Copy the generated invite URL and add the bot to your server
The Full Bot (bot.py)
Here's the complete file. Read through it first, then I'll break down each section.
"""
StemSplit Discord Bot
Splits any audio file into vocals, drums, bass, and other stems.
"""
import asyncio
import os
import time
from io import BytesIO
import discord
import requests
from discord import app_commands
from dotenv import load_dotenv
load_dotenv()
DISCORD_TOKEN = os.environ["DISCORD_TOKEN"]
STEMSPLIT_KEY = os.environ["STEMSPLIT_API_KEY"]
API_BASE = "https://api.stemsplit.io/v1"
HEADERS = {"Authorization": f"Bearer {STEMSPLIT_KEY}"}
MAX_FILE_MB = 20
POLL_INTERVAL = 5
POLL_TIMEOUT = 300 # 5 minutes max
STEM_EMOJI = {
"vocals": "🎤",
"drums": "🥁",
"bass": "🎸",
"other": "🎹",
}
class StemBot(discord.Client):
def __init__(self) -> None:
intents = discord.Intents.default()
super().__init__(intents=intents)
self.tree = app_commands.CommandTree(self)
async def setup_hook(self) -> None:
await self.tree.sync()
async def on_ready(self) -> None:
print(f"Logged in as {self.user} (id: {self.user.id})")
client = StemBot()
# ── Slash command ──────────────────────────────────────────────────────────────
@client.tree.command(name="split", description="Split an audio file into stems (vocals, drums, bass, other)")
@app_commands.describe(file="Audio file to split (.mp3, .wav, .flac, .m4a — max 20 MB)")
async def split_command(interaction: discord.Interaction, file: discord.Attachment) -> None:
# Defer immediately — gives us 15 minutes instead of 3 seconds
await interaction.response.defer(thinking=True)
# Validate file size
if file.size > MAX_FILE_MB * 1024 * 1024:
await interaction.followup.send(
f"❌ File too large ({file.size / 1024 / 1024:.1f} MB). Max is {MAX_FILE_MB} MB."
)
return
# Validate file type
allowed = {".mp3", ".wav", ".flac", ".m4a", ".ogg", ".aac"}
ext = os.path.splitext(file.filename)[1].lower()
if ext not in allowed:
await interaction.followup.send(
f"❌ Unsupported file type `{ext}`. Supported: {', '.join(sorted(allowed))}"
)
return
await interaction.followup.send(f"⏳ Got it! Splitting **{file.filename}** — this takes ~35–45 seconds…")
try:
# Download the attachment
audio_bytes = await file.read()
# Submit to StemSplit API
job_id = await asyncio.get_event_loop().run_in_executor(
None, lambda: submit_job(audio_bytes, file.filename)
)
# Poll until done
stems = await asyncio.get_event_loop().run_in_executor(
None, lambda: poll_job(job_id)
)
# DM the user
dm_channel = await interaction.user.create_dm()
embed = build_embed(file.filename, stems)
await dm_channel.send(embed=embed)
await interaction.followup.send("✅ Done! Check your DMs for the stem download links.")
except StemSplitError as e:
await interaction.followup.send(f"❌ StemSplit API error: {e}")
except TimeoutError:
await interaction.followup.send("❌ Job timed out after 5 minutes. Try a shorter file.")
except Exception as e:
await interaction.followup.send(f"❌ Unexpected error: {e}")
# ── StemSplit API helpers ──────────────────────────────────────────────────────
class StemSplitError(RuntimeError):
pass
def submit_job(audio_bytes: bytes, filename: str) -> str:
"""POST the audio file to StemSplit and return the job_id."""
res = requests.post(
f"{API_BASE}/jobs",
headers=HEADERS,
files={"file": (filename, BytesIO(audio_bytes), "audio/mpeg")},
timeout=60,
)
if not res.ok:
raise StemSplitError(f"HTTP {res.status_code}: {res.text[:200]}")
return res.json()["job_id"]
def poll_job(job_id: str) -> dict:
"""Poll GET /jobs/{job_id} every POLL_INTERVAL seconds until complete."""
deadline = time.time() + POLL_TIMEOUT
while time.time() < deadline:
res = requests.get(f"{API_BASE}/jobs/{job_id}", headers=HEADERS, timeout=15)
if not res.ok:
raise StemSplitError(f"Poll error HTTP {res.status_code}")
data = res.json()
status = data.get("status")
if status == "done":
return data.get("stems", {})
if status == "failed":
raise StemSplitError(data.get("error", "Job failed"))
time.sleep(POLL_INTERVAL)
raise TimeoutError()
def build_embed(filename: str, stems: dict) -> discord.Embed:
"""Build a Discord embed with download links for each stem."""
embed = discord.Embed(
title=f"🎵 Stems ready: {filename}",
description="Your audio has been split into individual tracks. Links expire in 24 hours.",
color=discord.Color.blurple(),
)
for stem_name, url in stems.items():
emoji = STEM_EMOJI.get(stem_name, "🎵")
embed.add_field(
name=f"{emoji} {stem_name.capitalize()}",
value=f"[Download {stem_name}.wav]({url})",
inline=False,
)
embed.set_footer(text="Powered by StemSplit · stemsplit.io")
return embed
# ── Run ───────────────────────────────────────────────────────────────────────
client.run(DISCORD_TOKEN)
Code Walkthrough
Deferring the interaction
The most important line in the bot:
await interaction.response.defer(thinking=True)
Discord gives you exactly 3 seconds to respond to a slash command before it shows "interaction failed." Deferring extends that to 15 minutes — enough time for the API job to complete. The thinking=True flag shows a "Bot is thinking…" indicator to the user.
Submitting the job
res = requests.post(
f"{API_BASE}/jobs",
headers=HEADERS,
files={"file": (filename, BytesIO(audio_bytes), "audio/mpeg")},
)
StemSplit's REST API accepts a multipart file upload and returns a job_id immediately. The job runs asynchronously — the full API reference lives in the StemSplit REST API docs. If you've already built something with the API, the pattern here is identical to the async polling approach used in StemSplit's Python vocal remover tutorial.
Polling loop
while time.time() < deadline:
data = requests.get(f"{API_BASE}/jobs/{job_id}", ...).json()
if data["status"] == "done":
return data["stems"]
time.sleep(POLL_INTERVAL)
The loop checks every 5 seconds and exits on "done" or "failed". I'm running this in run_in_executor so it doesn't block the Discord event loop while waiting.
The result embed
When the job completes, stems is a dict of {"vocals": "https://...", "drums": "https://...", ...}. We loop over it to build Discord embed fields with download links. Each link is a pre-signed URL that expires after 24 hours.
Error Handling
A few edge cases worth locking down:
File size. Discord's attachment limit is 25 MB for standard servers but API file uploads can add overhead. Keep the guard at 20 MB to be safe.
API rate limits. StemSplit's API rate-limits at the job submission level. If you're running this in a busy server, add a per-user cooldown:
from discord.ext.commands import CooldownMapping, BucketType
cooldown = CooldownMapping.from_cooldown(2, 60, BucketType.user) # 2 splits per user per minute
Discord interaction timeouts. If the job somehow runs longer than 15 minutes (extremely unlikely — typical tracks finish in under 1 minute), the interaction expires. The POLL_TIMEOUT = 300 guard catches this and tells the user to try a shorter file.
Running It
Locally:
python bot.py
Or with Docker:
# docker-compose.yml
services:
stembot:
image: python:3.12-slim
working_dir: /app
volumes:
- .:/app
command: pip install -r requirements.txt && python bot.py
env_file:
- .env
restart: unless-stopped
docker-compose up -d
Wrapping Up
The bot is around 100 lines, fully async, and ready to drop into any server. The same job-submission + polling pattern works for batch processing, Flask APIs, or any other integration — StemSplit's official Discord bot guide covers additional configuration options like webhook callbacks if you want to skip polling entirely.
If you'd rather run the separation model locally without an API, the Hashnode walkthrough on running the full Demucs acapella pipeline in Python covers everything — tradeoffs included.
Questions or improvements? Drop them in the comments — I'd love to see what people build with this.
Top comments (0)