DEV Community

StemSplit
StemSplit

Posted on

How to Build a Discord Bot that Splits Audio Stems with Python (2026)

Discord has 500 million users. Most bots stream music from YouTube. This one splits it into stems — vocals, drums, bass, and instrumentals — and drops download links straight into your DMs.

Here's the full walkthrough: slash command, file upload, async API job, polling loop, and a result embed that would make your server's musicians happy.

What You'll Learn

  • ✅ How to create a /split slash command with discord.py 2.x app_commands
  • ✅ How to accept audio file uploads through Discord interactions
  • ✅ How to call the StemSplit REST API and poll for async job completion
  • ✅ How to DM the user individual stem download links when the job finishes
  • ✅ How to handle file size limits, API errors, and Discord's 3-second interaction timeout
  • ✅ A working docker-compose.yml to deploy it anywhere

Prerequisites

pip install "discord.py>=2.4" requests python-dotenv
Enter fullscreen mode Exit fullscreen mode

You need three things:

  1. A Discord bot token — create a bot at discord.com/developers/applications, enable the MESSAGE CONTENT intent and bot scope with Send Messages, Read Message History, Attach Files, and Use Application Commands permissions.

  2. A StemSplit API key — grab one from the free tier at StemSplit's developer portal (5 minutes of processing, no card). The same key powers this bot and any other integration you build.

  3. A .env file in your project root:

DISCORD_TOKEN=your_discord_bot_token_here
STEMSPLIT_API_KEY=your_stemsplit_api_key_here
Enter fullscreen mode Exit fullscreen mode

How the Bot Works

The flow is simple:

User runs /split → uploads audio file
  → Bot defers the interaction (buys 15 min instead of 3 seconds)
  → POSTs file to StemSplit API → gets job_id
  → Polls GET /jobs/{job_id} every 5s
  → Job completes → bot DMs user an embed with 4 download links
  → Bot replies publicly: "Done! Check your DMs."
Enter fullscreen mode Exit fullscreen mode

StemSplit runs HTDemucs on GPU — a 3-minute track takes roughly 35–45 seconds. HTDemucs is the current best open-source model for stem separation, so quality is on par with what you'd get running Demucs locally, without the 4 GB model download.


Setting Up the Discord Application

In the Discord Developer Portal:

  1. Create a new application → add a Bot
  2. Under Privileged Gateway Intents, enable Message Content Intent
  3. Under OAuth2 → URL Generator, select scopes: bot + applications.commands
  4. Add permissions: Send Messages, Attach Files, Use Application Commands, Send Messages in Threads
  5. Copy the generated invite URL and add the bot to your server

The Full Bot (bot.py)

Here's the complete file. Read through it first, then I'll break down each section.

"""
StemSplit Discord Bot
Splits any audio file into vocals, drums, bass, and other stems.
"""
import asyncio
import os
import time
from io import BytesIO

import discord
import requests
from discord import app_commands
from dotenv import load_dotenv

load_dotenv()

DISCORD_TOKEN = os.environ["DISCORD_TOKEN"]
STEMSPLIT_KEY = os.environ["STEMSPLIT_API_KEY"]

API_BASE = "https://api.stemsplit.io/v1"
HEADERS = {"Authorization": f"Bearer {STEMSPLIT_KEY}"}

MAX_FILE_MB = 20
POLL_INTERVAL = 5
POLL_TIMEOUT = 300  # 5 minutes max

STEM_EMOJI = {
    "vocals": "🎤",
    "drums": "🥁",
    "bass": "🎸",
    "other": "🎹",
}


class StemBot(discord.Client):
    def __init__(self) -> None:
        intents = discord.Intents.default()
        super().__init__(intents=intents)
        self.tree = app_commands.CommandTree(self)

    async def setup_hook(self) -> None:
        await self.tree.sync()

    async def on_ready(self) -> None:
        print(f"Logged in as {self.user} (id: {self.user.id})")


client = StemBot()


# ── Slash command ──────────────────────────────────────────────────────────────

@client.tree.command(name="split", description="Split an audio file into stems (vocals, drums, bass, other)")
@app_commands.describe(file="Audio file to split (.mp3, .wav, .flac, .m4a — max 20 MB)")
async def split_command(interaction: discord.Interaction, file: discord.Attachment) -> None:
    # Defer immediately — gives us 15 minutes instead of 3 seconds
    await interaction.response.defer(thinking=True)

    # Validate file size
    if file.size > MAX_FILE_MB * 1024 * 1024:
        await interaction.followup.send(
            f"❌ File too large ({file.size / 1024 / 1024:.1f} MB). Max is {MAX_FILE_MB} MB."
        )
        return

    # Validate file type
    allowed = {".mp3", ".wav", ".flac", ".m4a", ".ogg", ".aac"}
    ext = os.path.splitext(file.filename)[1].lower()
    if ext not in allowed:
        await interaction.followup.send(
            f"❌ Unsupported file type `{ext}`. Supported: {', '.join(sorted(allowed))}"
        )
        return

    await interaction.followup.send(f"⏳ Got it! Splitting **{file.filename}** — this takes ~35–45 seconds…")

    try:
        # Download the attachment
        audio_bytes = await file.read()

        # Submit to StemSplit API
        job_id = await asyncio.get_event_loop().run_in_executor(
            None, lambda: submit_job(audio_bytes, file.filename)
        )

        # Poll until done
        stems = await asyncio.get_event_loop().run_in_executor(
            None, lambda: poll_job(job_id)
        )

        # DM the user
        dm_channel = await interaction.user.create_dm()
        embed = build_embed(file.filename, stems)
        await dm_channel.send(embed=embed)
        await interaction.followup.send("✅ Done! Check your DMs for the stem download links.")

    except StemSplitError as e:
        await interaction.followup.send(f"❌ StemSplit API error: {e}")
    except TimeoutError:
        await interaction.followup.send("❌ Job timed out after 5 minutes. Try a shorter file.")
    except Exception as e:
        await interaction.followup.send(f"❌ Unexpected error: {e}")


# ── StemSplit API helpers ──────────────────────────────────────────────────────

class StemSplitError(RuntimeError):
    pass


def submit_job(audio_bytes: bytes, filename: str) -> str:
    """POST the audio file to StemSplit and return the job_id."""
    res = requests.post(
        f"{API_BASE}/jobs",
        headers=HEADERS,
        files={"file": (filename, BytesIO(audio_bytes), "audio/mpeg")},
        timeout=60,
    )
    if not res.ok:
        raise StemSplitError(f"HTTP {res.status_code}: {res.text[:200]}")
    return res.json()["job_id"]


def poll_job(job_id: str) -> dict:
    """Poll GET /jobs/{job_id} every POLL_INTERVAL seconds until complete."""
    deadline = time.time() + POLL_TIMEOUT
    while time.time() < deadline:
        res = requests.get(f"{API_BASE}/jobs/{job_id}", headers=HEADERS, timeout=15)
        if not res.ok:
            raise StemSplitError(f"Poll error HTTP {res.status_code}")
        data = res.json()
        status = data.get("status")
        if status == "done":
            return data.get("stems", {})
        if status == "failed":
            raise StemSplitError(data.get("error", "Job failed"))
        time.sleep(POLL_INTERVAL)
    raise TimeoutError()


def build_embed(filename: str, stems: dict) -> discord.Embed:
    """Build a Discord embed with download links for each stem."""
    embed = discord.Embed(
        title=f"🎵 Stems ready: {filename}",
        description="Your audio has been split into individual tracks. Links expire in 24 hours.",
        color=discord.Color.blurple(),
    )
    for stem_name, url in stems.items():
        emoji = STEM_EMOJI.get(stem_name, "🎵")
        embed.add_field(
            name=f"{emoji} {stem_name.capitalize()}",
            value=f"[Download {stem_name}.wav]({url})",
            inline=False,
        )
    embed.set_footer(text="Powered by StemSplit · stemsplit.io")
    return embed


# ── Run ───────────────────────────────────────────────────────────────────────

client.run(DISCORD_TOKEN)
Enter fullscreen mode Exit fullscreen mode

Code Walkthrough

Deferring the interaction

The most important line in the bot:

await interaction.response.defer(thinking=True)
Enter fullscreen mode Exit fullscreen mode

Discord gives you exactly 3 seconds to respond to a slash command before it shows "interaction failed." Deferring extends that to 15 minutes — enough time for the API job to complete. The thinking=True flag shows a "Bot is thinking…" indicator to the user.

Submitting the job

res = requests.post(
    f"{API_BASE}/jobs",
    headers=HEADERS,
    files={"file": (filename, BytesIO(audio_bytes), "audio/mpeg")},
)
Enter fullscreen mode Exit fullscreen mode

StemSplit's REST API accepts a multipart file upload and returns a job_id immediately. The job runs asynchronously — the full API reference lives in the StemSplit REST API docs. If you've already built something with the API, the pattern here is identical to the async polling approach used in StemSplit's Python vocal remover tutorial.

Polling loop

while time.time() < deadline:
    data = requests.get(f"{API_BASE}/jobs/{job_id}", ...).json()
    if data["status"] == "done":
        return data["stems"]
    time.sleep(POLL_INTERVAL)
Enter fullscreen mode Exit fullscreen mode

The loop checks every 5 seconds and exits on "done" or "failed". I'm running this in run_in_executor so it doesn't block the Discord event loop while waiting.

The result embed

When the job completes, stems is a dict of {"vocals": "https://...", "drums": "https://...", ...}. We loop over it to build Discord embed fields with download links. Each link is a pre-signed URL that expires after 24 hours.


Error Handling

A few edge cases worth locking down:

File size. Discord's attachment limit is 25 MB for standard servers but API file uploads can add overhead. Keep the guard at 20 MB to be safe.

API rate limits. StemSplit's API rate-limits at the job submission level. If you're running this in a busy server, add a per-user cooldown:

from discord.ext.commands import CooldownMapping, BucketType

cooldown = CooldownMapping.from_cooldown(2, 60, BucketType.user)  # 2 splits per user per minute
Enter fullscreen mode Exit fullscreen mode

Discord interaction timeouts. If the job somehow runs longer than 15 minutes (extremely unlikely — typical tracks finish in under 1 minute), the interaction expires. The POLL_TIMEOUT = 300 guard catches this and tells the user to try a shorter file.


Running It

Locally:

python bot.py
Enter fullscreen mode Exit fullscreen mode

Or with Docker:

# docker-compose.yml
services:
  stembot:
    image: python:3.12-slim
    working_dir: /app
    volumes:
      - .:/app
    command: pip install -r requirements.txt && python bot.py
    env_file:
      - .env
    restart: unless-stopped
Enter fullscreen mode Exit fullscreen mode
docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

The bot is around 100 lines, fully async, and ready to drop into any server. The same job-submission + polling pattern works for batch processing, Flask APIs, or any other integration — StemSplit's official Discord bot guide covers additional configuration options like webhook callbacks if you want to skip polling entirely.

If you'd rather run the separation model locally without an API, the Hashnode walkthrough on running the full Demucs acapella pipeline in Python covers everything — tradeoffs included.

Questions or improvements? Drop them in the comments — I'd love to see what people build with this.

Top comments (0)