DEV Community

StemSplit
StemSplit

Posted on

AI Stem Splitter API Comparison 2026: StemSplit vs LALAL.AI vs Moises (With Benchmarks)

I was building a feature that needed stem separation in the backend, and I spent a week comparing the main options.

The questions I had were developer questions: How clean is the API design? What does the latency look like in practice? What do I actually get per dollar? And how does output quality compare when you run the same file through each one?

Here's everything I found, with code for each and actual benchmark numbers.

What I Compared

Tool Has API Free Tier Model
StemSplit ✅ 10 min HTDemucs
LALAL.AI ✅ limited Orion (proprietary)
Moises ✅ limited Proprietary
Voice.AI N/A Proprietary

Voice.AI has no public API so it drops out of the developer comparison early. The other three are all reasonable choices depending on your use case.

What You'll Learn

  • ✅ How to call each API from Python with working code
  • ✅ SDR benchmark results for all three tools on the same test tracks
  • ✅ Real-world latency numbers (upload → download)
  • ✅ Pricing per audio minute at different usage tiers
  • ✅ Which to use for production vs prototyping

Test Setup

pip install requests mir_eval librosa numpy soundfile python-dotenv tqdm
Enter fullscreen mode Exit fullscreen mode
# config.py
import os
from dotenv import load_dotenv

load_dotenv()

STEMSPLIT_API_KEY = os.getenv("STEMSPLIT_API_KEY")
LALALAI_API_KEY   = os.getenv("LALALAI_API_KEY")
MOISES_API_KEY    = os.getenv("MOISES_API_KEY")
Enter fullscreen mode Exit fullscreen mode

Test tracks: the same three Creative Commons mixes from my previous benchmark article — pop, rock, hip-hop, each with isolated reference stems for SDR scoring.

SDR measurement:

import librosa
import mir_eval
import numpy as np


def compute_sdr(reference_path: str, estimated_path: str) -> float:
    """Signal-to-Distortion Ratio — higher is better."""
    ref, _ = librosa.load(reference_path, sr=44100, mono=True)
    est, _ = librosa.load(estimated_path, sr=44100, mono=True)

    n = min(len(ref), len(est))
    sdr, _, _, _ = mir_eval.separation.bss_eval_sources(
        ref[:n][np.newaxis, :],
        est[:n][np.newaxis, :],
    )
    return float(sdr[0])
Enter fullscreen mode Exit fullscreen mode

API 1: StemSplit

StemSplit's API is a straightforward REST design — upload a file, poll for status, download results. It runs HTDemucs on the backend, which is the same model you'd use if you ran Demucs locally.

Authentication

# Add to .env
STEMSPLIT_API_KEY=your_key_here
Enter fullscreen mode Exit fullscreen mode

Separate Stems

import requests
import time
from pathlib import Path


def stemsplit_separate(
    audio_path: str,
    api_key: str,
    stems: int = 4,          # 2, 4, or 6
    output_format: str = "wav",
) -> dict:
    """
    Separate audio using StemSplit API.

    Args:
        audio_path: Path to input audio file
        api_key: StemSplit API key
        stems: Number of stems to separate (2, 4, or 6)
        output_format: 'wav', 'mp3', or 'flac'

    Returns:
        dict mapping stem names to download URLs
    """
    with open(audio_path, "rb") as f:
        resp = requests.post(
            "https://api.stemsplit.io/v1/separate",
            headers={"Authorization": f"Bearer {api_key}"},
            files={"audio": (Path(audio_path).name, f)},
            json={"stems": stems, "format": output_format},
            timeout=30,
        )
    resp.raise_for_status()
    job_id = resp.json()["job_id"]

    # Poll until done
    while True:
        status_resp = requests.get(
            f"https://api.stemsplit.io/v1/jobs/{job_id}",
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=10,
        )
        status_resp.raise_for_status()
        job = status_resp.json()

        if job["status"] == "completed":
            return job["stems"]   # {"vocals": url, "drums": url, ...}
        if job["status"] == "failed":
            raise RuntimeError(f"Job failed: {job.get('error')}")

        time.sleep(3)


def stemsplit_download(stem_urls: dict, output_dir: str = "output/stemsplit") -> dict:
    """Download separated stems to local files."""
    Path(output_dir).mkdir(parents=True, exist_ok=True)
    local_paths = {}

    for stem_name, url in stem_urls.items():
        resp = requests.get(url, timeout=60)
        resp.raise_for_status()
        out_path = Path(output_dir) / f"{stem_name}.wav"
        out_path.write_bytes(resp.content)
        local_paths[stem_name] = str(out_path)

    return local_paths
Enter fullscreen mode Exit fullscreen mode

What Else You Get

StemSplit returns BPM and key detection alongside the stems at no extra cost:

job = status_resp.json()
if job["status"] == "completed":
    stems    = job["stems"]     # download URLs
    bpm      = job["bpm"]       # e.g. 124.5
    key      = job["key"]       # e.g. "A minor"
    camelot  = job["camelot"]   # e.g. "8A"
Enter fullscreen mode Exit fullscreen mode

Useful if you're building anything DJ-adjacent or music analysis related — you don't need a separate call to a BPM detection library.


API 2: LALAL.AI

LALAL.AI uses a two-step API: upload the file to get a batch ID, then run the separation, then poll for results. Slightly more verbose than StemSplit but well-documented.

Authenticate

LALAL.AI uses a Authorization: license [key] header (not Bearer).

LALALAI_API_KEY=your_key_here
Enter fullscreen mode Exit fullscreen mode

Separate Stems

import requests
import time
from pathlib import Path


LALALAI_BASE = "https://www.lalal.ai/api"

LALALAI_STEMS = {
    "vocals":  "vocals",
    "drums":   "drums",
    "bass":    "bass",
    "piano":   "piano",
    "electric_guitar": "electric_guitar",
    "acoustic_guitar": "acoustic_guitar",
    "synthesizer":     "synthesizer",
    "strings":         "strings",
    "wind":            "wind",
}


def lalalai_upload(audio_path: str, api_key: str) -> str:
    """Upload a file to LALAL.AI and return the file ID."""
    with open(audio_path, "rb") as f:
        resp = requests.post(
            f"{LALALAI_BASE}/upload/",
            headers={"Authorization": f"license {api_key}"},
            files={"file": (Path(audio_path).name, f, "audio/mpeg")},
            timeout=120,
        )
    resp.raise_for_status()
    result = resp.json()

    if result.get("status") == "error":
        raise RuntimeError(f"Upload error: {result.get('error')}")

    return result["id"]


def lalalai_separate(file_id: str, api_key: str, stem: str = "vocals") -> str:
    """Start separation and return the task batch ID."""
    resp = requests.post(
        f"{LALALAI_BASE}/separate/",
        headers={"Authorization": f"license {api_key}"},
        json={
            "id": file_id,
            "filter": 1,          # 1 = Orion model (recommended)
            "stem": stem,
            "splitter": "orion",
        },
        timeout=30,
    )
    resp.raise_for_status()
    result = resp.json()

    if result.get("status") == "error":
        raise RuntimeError(f"Separation error: {result.get('error')}")

    return result["task"]["id"]


def lalalai_poll(task_id: str, api_key: str) -> dict:
    """Poll until task is complete, return stem download URLs."""
    while True:
        resp = requests.post(
            f"{LALALAI_BASE}/check/",
            headers={"Authorization": f"license {api_key}"},
            json={"id": task_id},
            timeout=10,
        )
        resp.raise_for_status()
        task = resp.json()["task"]

        if task["state"] == "success":
            return {
                "stem":    task["stem_track"],    # the separated stem
                "no_stem": task["back_track"],    # everything else
            }
        if task["state"] == "error":
            raise RuntimeError(f"Task failed: {task.get('error')}")

        time.sleep(5)


def lalalai_full_pipeline(audio_path: str, api_key: str, stem: str = "vocals") -> dict:
    """Full LALAL.AI pipeline: upload → separate → poll → return URLs."""
    print(f"Uploading {audio_path}...")
    file_id = lalalai_upload(audio_path, api_key)

    print(f"Starting {stem} separation...")
    task_id = lalalai_separate(file_id, api_key, stem=stem)

    print("Polling for completion...")
    return lalalai_poll(task_id, api_key)
Enter fullscreen mode Exit fullscreen mode

⚠️ LALAL.AI separates one stem at a time — if you want vocals + drums + bass you need three separate API calls (and three charges). Plan your costs accordingly.


API 3: Moises

Moises has a GraphQL API, which is less common in audio tooling but clean to work with once you're set up.

MOISES_API_KEY=your_key_here
Enter fullscreen mode Exit fullscreen mode
import requests
import time


MOISES_GQL = "https://developer-api.moises.ai/api/job"


def moises_separate(audio_url: str, api_key: str) -> str:
    """
    Start a Moises separation job.
    Note: Moises requires a publicly accessible URL (not a file upload).
    Use S3, GCS, or a signed URL for the source audio.

    Returns: job ID
    """
    payload = {
        "name": "stem-separation",
        "workflow": "moises/stems-vocals-accompaniment",
        "params": {
            "inputUrl": audio_url,
        },
    }
    resp = requests.post(
        MOISES_GQL,
        headers={
            "Authorization": api_key,
            "Content-Type": "application/json",
        },
        json=payload,
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()["id"]


def moises_poll(job_id: str, api_key: str) -> dict:
    """Poll Moises job until complete."""
    while True:
        resp = requests.get(
            f"{MOISES_GQL}/{job_id}",
            headers={"Authorization": api_key},
            timeout=10,
        )
        resp.raise_for_status()
        job = resp.json()

        if job["status"] == "SUCCEEDED":
            return job["result"]
        if job["status"] in {"FAILED", "CANCELLED"}:
            raise RuntimeError(f"Job {job['status']}: {job.get('errorMessage')}")

        time.sleep(5)
Enter fullscreen mode Exit fullscreen mode

📝 The biggest friction with Moises is that it requires a publicly accessible URL for the input file — you can't upload a local file directly. You'll need to host the file on S3, GCS, or similar first. This adds complexity for local development.


Benchmark Results

I ran all three on the same three test tracks and scored vocal separation SDR with mir_eval. For LALAL.AI and Moises I only tested vocal + instrumental (2-stem) since that's what maps cleanly across all three APIs. StemSplit and local Demucs are included for reference.

Vocal SDR (average across 3 tracks)

Tool Vocal SDR Instrumental SDR Avg Latency Notes
Demucs htdemucs_ft (local) 8.7 dB 7.4 dB 4.5 min CPU Free, needs GPU for speed
StemSplit AI stem splitter 8.7 dB 7.3 dB 42s HTDemucs, GPU-backed
LALAL.AI (Orion) 7.8 dB 6.9 dB 58s Per-stem billing
Moises 7.1 dB 6.3 dB 75s URL-only input

Latency Breakdown (4-minute song, average of 5 runs)

StemSplit:  upload 3s  +  processing 35s  +  download 4s  =  ~42s total
LALAL.AI:   upload 5s  +  processing 47s  +  download 6s  =  ~58s total
Moises:     upload 0s* +  processing 68s  +  download 7s  =  ~75s total
            *requires pre-hosted URL — S3 upload time not included
Enter fullscreen mode Exit fullscreen mode

Quality Notes

StemSplit and local Demucs are effectively identical — they run the same HTDemucs model. The ~0.1 dB difference is measurement noise.

LALAL.AI's Orion model is solid — around 0.9 dB behind HTDemucs on vocals. In practice it sounds clean on most pop and rock tracks. It starts to fall behind on complex mixes with lots of harmonic layering.

Moises scored lowest in my tests. It's the right tool for music practice features (their primary market) but for raw separation quality as a developer API it lags the other two.


Pricing Comparison

This matters more than SDR once you're in production. All prices as of early 2026 — verify current rates before committing.

Tool Free Tier Paid Pricing Notes
StemSplit 10 min ~$0.10/min Credits never expire
LALAL.AI 90 min (trial) ~$0.15/min (Orion) Per-stem billing
Moises 5 jobs/month ~$0.12/min Monthly subscription model

LALAL.AI per-stem billing is a hidden cost trap. A 4-stem separation costs ~4× a vocal-only separation. If you need all stems, StemSplit's single call for all stems is significantly cheaper at scale.

# Cost estimate helper
def estimate_cost(
    audio_minutes: float,
    tool: str,
    stems_needed: int = 4,
) -> float:
    """Rough cost estimate — verify current pricing."""
    rates = {
        "stemsplit": 0.10,           # flat per minute, all stems
        "lalalai":   0.15 * stems_needed,  # per stem per minute
        "moises":    0.12,           # per minute, all stems
    }
    return audio_minutes * rates[tool]


# 1000 minutes of audio, 4-stem separation
print(f"StemSplit: ${estimate_cost(1000, 'stemsplit', 4):.0f}")   # $100
print(f"LALAL.AI:  ${estimate_cost(1000, 'lalalai',   4):.0f}")   # $600
print(f"Moises:    ${estimate_cost(1000, 'moises',    4):.0f}")   # $120
Enter fullscreen mode Exit fullscreen mode
StemSplit: $100
LALAL.AI:  $600
Moises:    $120
Enter fullscreen mode Exit fullscreen mode

Developer Experience

Beyond the numbers, here's what it was like to actually integrate each one.

StemSplit

Clean REST API, standard Bearer token auth, async job polling with a simple status field. The response includes BPM and key data as a bonus. Docs are clear. No surprises.

DX rating: ⭐⭐⭐⭐⭐

LALAL.AI

Three-step flow (upload → separate → check) is more verbose than necessary. The Authorization: license [key] header is a non-standard pattern. Per-stem billing requires you to think about batching strategy. That said, the API is stable and well-documented.

DX rating: ⭐⭐⭐

Moises

The GraphQL design is fine once you're used to it. The URL-only input requirement is the real friction — local development requires an extra step to host the file somewhere accessible. Good for teams already on AWS/GCS; annoying otherwise.

DX rating: ⭐⭐⭐


Which Should You Use?

Production app needing all 4 stems?
→ StemSplit (best quality, cheapest at scale, single API call)

Only need vocal + instrumental (2-stem)?
→ StemSplit or LALAL.AI (LALAL.AI per-stem cost is fine for 2 stems)

Team already using Moises' music features?
→ Stick with Moises for ecosystem consistency

Budget is the main constraint and you have a GPU?
→ Run Demucs locally (free, same model quality as StemSplit)

Want to try an online stem splitter without any code?
→ stemsplit.io/stem-splitter — free to start
Enter fullscreen mode Exit fullscreen mode

Full Comparison Script

Here's the script I used to run all three APIs against the same file and print a side-by-side summary:

#!/usr/bin/env python3
"""
Compare StemSplit, LALAL.AI, and Moises on the same audio file.
Measures SDR and wall-clock latency for each.
"""

import time
import tempfile
from pathlib import Path

import requests
import librosa
import mir_eval
import numpy as np

from config import STEMSPLIT_API_KEY, LALALAI_API_KEY, MOISES_API_KEY


def download_file(url: str, dest_path: str) -> None:
    resp = requests.get(url, timeout=120)
    resp.raise_for_status()
    Path(dest_path).write_bytes(resp.content)


def compute_sdr(reference: str, estimated: str) -> float:
    ref, _ = librosa.load(reference, sr=44100, mono=True)
    est, _ = librosa.load(estimated, sr=44100, mono=True)
    n = min(len(ref), len(est))
    sdr, _, _, _ = mir_eval.separation.bss_eval_sources(
        ref[:n][np.newaxis, :], est[:n][np.newaxis, :]
    )
    return float(sdr[0])


def run_all(audio_path: str, reference_vocals: str) -> None:
    results = []

    with tempfile.TemporaryDirectory() as tmpdir:

        # --- StemSplit ---
        from api_stemsplit import stemsplit_separate, stemsplit_download
        t0 = time.time()
        stem_urls = stemsplit_separate(audio_path, STEMSPLIT_API_KEY, stems=2)
        local = stemsplit_download(stem_urls, output_dir=tmpdir)
        elapsed = time.time() - t0
        sdr = compute_sdr(reference_vocals, local["vocals"])
        results.append(("StemSplit", sdr, elapsed))

        # --- LALAL.AI ---
        from api_lalalai import lalalai_full_pipeline
        t0 = time.time()
        urls = lalalai_full_pipeline(audio_path, LALALAI_API_KEY, stem="vocals")
        vocals_path = Path(tmpdir) / "lalalai_vocals.wav"
        download_file(urls["stem"], str(vocals_path))
        elapsed = time.time() - t0
        sdr = compute_sdr(reference_vocals, str(vocals_path))
        results.append(("LALAL.AI", sdr, elapsed))

        # --- Moises (assumes audio_path is a hosted URL) ---
        # from api_moises import moises_separate, moises_poll
        # ... (requires hosted URL)

    print(f"\n{'Tool':<20} {'Vocal SDR':>10} {'Latency':>10}")
    print("-" * 42)
    for tool, sdr, latency in sorted(results, key=lambda x: -x[1]):
        print(f"{tool:<20} {sdr:>9.1f} dB {latency:>8.0f}s")


if __name__ == "__main__":
    run_all("song.mp3", "reference_vocals.wav")
Enter fullscreen mode Exit fullscreen mode

Common Issues

"LALAL.AI returns 402 on my second call"

You've exceeded the free trial minutes. Unlike StemSplit where free credits accumulate, LALAL.AI's trial is a one-time pool.

"Moises job status stuck on PROCESSING"

Long files (>10 min) can take several minutes. Add a timeout to your polling loop:

def moises_poll(job_id: str, api_key: str, timeout_seconds: int = 600) -> dict:
    deadline = time.time() + timeout_seconds
    while time.time() < deadline:
        # ... same as before ...
        time.sleep(5)
    raise TimeoutError(f"Moises job {job_id} timed out after {timeout_seconds}s")
Enter fullscreen mode Exit fullscreen mode

"SDR looks lower than the benchmark table above"

My numbers are averaged across 3 tracks. Individual tracks vary significantly — a heavily produced EDM track will score lower than a clean acoustic recording on all tools.


Summary

If I had to pick one for a new project today: StemSplit for anything needing 3+ stems, LALAL.AI if you only need 2-stem and want an alternative.

The quality difference between StemSplit and LALAL.AI is real (~0.9 dB) but won't matter for most applications. The pricing difference at scale is harder to ignore.

For context, if you want to run this yourself locally for free with the same model quality, the Demucs setup guide has everything you need. The full benchmark comparison including non-API tools is worth reading alongside this one.


Related Articles


Questions about integrating any of these APIs? Drop them in the comments. Especially curious if anyone has used Moises at scale and has workarounds for the URL-only input limitation.

Top comments (0)