StemSplit

Posted on Mar 11

AI Stem Splitter API Comparison 2026: StemSplit vs LALAL.AI vs Moises (With Benchmarks)

#ai #api #backend #performance

I was building a feature that needed stem separation in the backend, and I spent a week comparing the main options.

The questions I had were developer questions: How clean is the API design? What does the latency look like in practice? What do I actually get per dollar? And how does output quality compare when you run the same file through each one?

Here's everything I found, with code for each and actual benchmark numbers.

What I Compared

Tool	Has API	Free Tier	Model
StemSplit	✅	✅ 10 min	HTDemucs
LALAL.AI	✅	✅ limited	Orion (proprietary)
Moises	✅	✅ limited	Proprietary
Voice.AI	❌	N/A	Proprietary

Voice.AI has no public API so it drops out of the developer comparison early. The other three are all reasonable choices depending on your use case.

What You'll Learn

✅ How to call each API from Python with working code
✅ SDR benchmark results for all three tools on the same test tracks
✅ Real-world latency numbers (upload → download)
✅ Pricing per audio minute at different usage tiers
✅ Which to use for production vs prototyping

Test Setup

pip install requests mir_eval librosa numpy soundfile python-dotenv tqdm

# config.py
import os
from dotenv import load_dotenv

load_dotenv()

STEMSPLIT_API_KEY = os.getenv("STEMSPLIT_API_KEY")
LALALAI_API_KEY   = os.getenv("LALALAI_API_KEY")
MOISES_API_KEY    = os.getenv("MOISES_API_KEY")

Test tracks: the same three Creative Commons mixes from my previous benchmark article — pop, rock, hip-hop, each with isolated reference stems for SDR scoring.

SDR measurement:

import librosa
import mir_eval
import numpy as np


def compute_sdr(reference_path: str, estimated_path: str) -> float:
    """Signal-to-Distortion Ratio — higher is better."""
    ref, _ = librosa.load(reference_path, sr=44100, mono=True)
    est, _ = librosa.load(estimated_path, sr=44100, mono=True)

    n = min(len(ref), len(est))
    sdr, _, _, _ = mir_eval.separation.bss_eval_sources(
        ref[:n][np.newaxis, :],
        est[:n][np.newaxis, :],
    )
    return float(sdr[0])

API 1: StemSplit

StemSplit's API is a straightforward REST design — upload a file, poll for status, download results. It runs HTDemucs on the backend, which is the same model you'd use if you ran Demucs locally.

Authentication

# Add to .env
STEMSPLIT_API_KEY=your_key_here

Separate Stems

import requests
import time
from pathlib import Path


def stemsplit_separate(
    audio_path: str,
    api_key: str,
    stems: int = 4,          # 2, 4, or 6
    output_format: str = "wav",
) -> dict:
    """
    Separate audio using StemSplit API.

    Args:
        audio_path: Path to input audio file
        api_key: StemSplit API key
        stems: Number of stems to separate (2, 4, or 6)
        output_format: 'wav', 'mp3', or 'flac'

    Returns:
        dict mapping stem names to download URLs
    """
    with open(audio_path, "rb") as f:
        resp = requests.post(
            "https://api.stemsplit.io/v1/separate",
            headers={"Authorization": f"Bearer {api_key}"},
            files={"audio": (Path(audio_path).name, f)},
            json={"stems": stems, "format": output_format},
            timeout=30,
        )
    resp.raise_for_status()
    job_id = resp.json()["job_id"]

    # Poll until done
    while True:
        status_resp = requests.get(
            f"https://api.stemsplit.io/v1/jobs/{job_id}",
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=10,
        )
        status_resp.raise_for_status()
        job = status_resp.json()

        if job["status"] == "completed":
            return job["stems"]   # {"vocals": url, "drums": url, ...}
        if job["status"] == "failed":
            raise RuntimeError(f"Job failed: {job.get('error')}")

        time.sleep(3)


def stemsplit_download(stem_urls: dict, output_dir: str = "output/stemsplit") -> dict:
    """Download separated stems to local files."""
    Path(output_dir).mkdir(parents=True, exist_ok=True)
    local_paths = {}

    for stem_name, url in stem_urls.items():
        resp = requests.get(url, timeout=60)
        resp.raise_for_status()
        out_path = Path(output_dir) / f"{stem_name}.wav"
        out_path.write_bytes(resp.content)
        local_paths[stem_name] = str(out_path)

    return local_paths

What Else You Get

StemSplit returns BPM and key detection alongside the stems at no extra cost:

job = status_resp.json()
if job["status"] == "completed":
    stems    = job["stems"]     # download URLs
    bpm      = job["bpm"]       # e.g. 124.5
    key      = job["key"]       # e.g. "A minor"
    camelot  = job["camelot"]   # e.g. "8A"

Useful if you're building anything DJ-adjacent or music analysis related — you don't need a separate call to a BPM detection library.

API 2: LALAL.AI

LALAL.AI uses a two-step API: upload the file to get a batch ID, then run the separation, then poll for results. Slightly more verbose than StemSplit but well-documented.

Authenticate

LALAL.AI uses a Authorization: license [key] header (not Bearer).

LALALAI_API_KEY=your_key_here

Separate Stems

import requests
import time
from pathlib import Path


LALALAI_BASE = "https://www.lalal.ai/api"

LALALAI_STEMS = {
    "vocals":  "vocals",
    "drums":   "drums",
    "bass":    "bass",
    "piano":   "piano",
    "electric_guitar": "electric_guitar",
    "acoustic_guitar": "acoustic_guitar",
    "synthesizer":     "synthesizer",
    "strings":         "strings",
    "wind":            "wind",
}


def lalalai_upload(audio_path: str, api_key: str) -> str:
    """Upload a file to LALAL.AI and return the file ID."""
    with open(audio_path, "rb") as f:
        resp = requests.post(
            f"{LALALAI_BASE}/upload/",
            headers={"Authorization": f"license {api_key}"},
            files={"file": (Path(audio_path).name, f, "audio/mpeg")},
            timeout=120,
        )
    resp.raise_for_status()
    result = resp.json()

    if result.get("status") == "error":
        raise RuntimeError(f"Upload error: {result.get('error')}")

    return result["id"]


def lalalai_separate(file_id: str, api_key: str, stem: str = "vocals") -> str:
    """Start separation and return the task batch ID."""
    resp = requests.post(
        f"{LALALAI_BASE}/separate/",
        headers={"Authorization": f"license {api_key}"},
        json={
            "id": file_id,
            "filter": 1,          # 1 = Orion model (recommended)
            "stem": stem,
            "splitter": "orion",
        },
        timeout=30,
    )
    resp.raise_for_status()
    result = resp.json()

    if result.get("status") == "error":
        raise RuntimeError(f"Separation error: {result.get('error')}")

    return result["task"]["id"]


def lalalai_poll(task_id: str, api_key: str) -> dict:
    """Poll until task is complete, return stem download URLs."""
    while True:
        resp = requests.post(
            f"{LALALAI_BASE}/check/",
            headers={"Authorization": f"license {api_key}"},
            json={"id": task_id},
            timeout=10,
        )
        resp.raise_for_status()
        task = resp.json()["task"]

        if task["state"] == "success":
            return {
                "stem":    task["stem_track"],    # the separated stem
                "no_stem": task["back_track"],    # everything else
            }
        if task["state"] == "error":
            raise RuntimeError(f"Task failed: {task.get('error')}")

        time.sleep(5)


def lalalai_full_pipeline(audio_path: str, api_key: str, stem: str = "vocals") -> dict:
    """Full LALAL.AI pipeline: upload → separate → poll → return URLs."""
    print(f"Uploading {audio_path}...")
    file_id = lalalai_upload(audio_path, api_key)

    print(f"Starting {stem} separation...")
    task_id = lalalai_separate(file_id, api_key, stem=stem)

    print("Polling for completion...")
    return lalalai_poll(task_id, api_key)

⚠️ LALAL.AI separates one stem at a time — if you want vocals + drums + bass you need three separate API calls (and three charges). Plan your costs accordingly.

API 3: Moises

Moises has a GraphQL API, which is less common in audio tooling but clean to work with once you're set up.

MOISES_API_KEY=your_key_here

import requests
import time


MOISES_GQL = "https://developer-api.moises.ai/api/job"


def moises_separate(audio_url: str, api_key: str) -> str:
    """
    Start a Moises separation job.
    Note: Moises requires a publicly accessible URL (not a file upload).
    Use S3, GCS, or a signed URL for the source audio.

    Returns: job ID
    """
    payload = {
        "name": "stem-separation",
        "workflow": "moises/stems-vocals-accompaniment",
        "params": {
            "inputUrl": audio_url,
        },
    }
    resp = requests.post(
        MOISES_GQL,
        headers={
            "Authorization": api_key,
            "Content-Type": "application/json",
        },
        json=payload,
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()["id"]


def moises_poll(job_id: str, api_key: str) -> dict:
    """Poll Moises job until complete."""
    while True:
        resp = requests.get(
            f"{MOISES_GQL}/{job_id}",
            headers={"Authorization": api_key},
            timeout=10,
        )
        resp.raise_for_status()
        job = resp.json()

        if job["status"] == "SUCCEEDED":
            return job["result"]
        if job["status"] in {"FAILED", "CANCELLED"}:
            raise RuntimeError(f"Job {job['status']}: {job.get('errorMessage')}")

        time.sleep(5)

📝 The biggest friction with Moises is that it requires a publicly accessible URL for the input file — you can't upload a local file directly. You'll need to host the file on S3, GCS, or similar first. This adds complexity for local development.

Benchmark Results

I ran all three on the same three test tracks and scored vocal separation SDR with mir_eval. For LALAL.AI and Moises I only tested vocal + instrumental (2-stem) since that's what maps cleanly across all three APIs. StemSplit and local Demucs are included for reference.

Vocal SDR (average across 3 tracks)

Tool	Vocal SDR	Instrumental SDR	Avg Latency	Notes
Demucs htdemucs_ft (local)	8.7 dB	7.4 dB	4.5 min CPU	Free, needs GPU for speed
StemSplit AI stem splitter	8.7 dB	7.3 dB	42s	HTDemucs, GPU-backed
LALAL.AI (Orion)	7.8 dB	6.9 dB	58s	Per-stem billing
Moises	7.1 dB	6.3 dB	75s	URL-only input

Latency Breakdown (4-minute song, average of 5 runs)

StemSplit:  upload 3s  +  processing 35s  +  download 4s  =  ~42s total
LALAL.AI:   upload 5s  +  processing 47s  +  download 6s  =  ~58s total
Moises:     upload 0s* +  processing 68s  +  download 7s  =  ~75s total
            *requires pre-hosted URL — S3 upload time not included

Quality Notes

StemSplit and local Demucs are effectively identical — they run the same HTDemucs model. The ~0.1 dB difference is measurement noise.

LALAL.AI's Orion model is solid — around 0.9 dB behind HTDemucs on vocals. In practice it sounds clean on most pop and rock tracks. It starts to fall behind on complex mixes with lots of harmonic layering.

Moises scored lowest in my tests. It's the right tool for music practice features (their primary market) but for raw separation quality as a developer API it lags the other two.

Pricing Comparison

This matters more than SDR once you're in production. All prices as of early 2026 — verify current rates before committing.

Tool	Free Tier	Paid Pricing	Notes
StemSplit	10 min	~$0.10/min	Credits never expire
LALAL.AI	90 min (trial)	~$0.15/min (Orion)	Per-stem billing
Moises	5 jobs/month	~$0.12/min	Monthly subscription model

LALAL.AI per-stem billing is a hidden cost trap. A 4-stem separation costs ~4× a vocal-only separation. If you need all stems, StemSplit's single call for all stems is significantly cheaper at scale.

# Cost estimate helper
def estimate_cost(
    audio_minutes: float,
    tool: str,
    stems_needed: int = 4,
) -> float:
    """Rough cost estimate — verify current pricing."""
    rates = {
        "stemsplit": 0.10,           # flat per minute, all stems
        "lalalai":   0.15 * stems_needed,  # per stem per minute
        "moises":    0.12,           # per minute, all stems
    }
    return audio_minutes * rates[tool]


# 1000 minutes of audio, 4-stem separation
print(f"StemSplit: ${estimate_cost(1000, 'stemsplit', 4):.0f}")   # $100
print(f"LALAL.AI:  ${estimate_cost(1000, 'lalalai',   4):.0f}")   # $600
print(f"Moises:    ${estimate_cost(1000, 'moises',    4):.0f}")   # $120

StemSplit: $100
LALAL.AI:  $600
Moises:    $120

Developer Experience

Beyond the numbers, here's what it was like to actually integrate each one.

StemSplit

Clean REST API, standard Bearer token auth, async job polling with a simple status field. The response includes BPM and key data as a bonus. Docs are clear. No surprises.

DX rating: ⭐⭐⭐⭐⭐

LALAL.AI

Three-step flow (upload → separate → check) is more verbose than necessary. The Authorization: license [key] header is a non-standard pattern. Per-stem billing requires you to think about batching strategy. That said, the API is stable and well-documented.

DX rating: ⭐⭐⭐

Moises

The GraphQL design is fine once you're used to it. The URL-only input requirement is the real friction — local development requires an extra step to host the file somewhere accessible. Good for teams already on AWS/GCS; annoying otherwise.

DX rating: ⭐⭐⭐

Which Should You Use?

Production app needing all 4 stems?
→ StemSplit (best quality, cheapest at scale, single API call)

Only need vocal + instrumental (2-stem)?
→ StemSplit or LALAL.AI (LALAL.AI per-stem cost is fine for 2 stems)

Team already using Moises' music features?
→ Stick with Moises for ecosystem consistency

Budget is the main constraint and you have a GPU?
→ Run Demucs locally (free, same model quality as StemSplit)

Want to try an online stem splitter without any code?
→ stemsplit.io/stem-splitter — free to start

Full Comparison Script

Here's the script I used to run all three APIs against the same file and print a side-by-side summary:

#!/usr/bin/env python3
"""
Compare StemSplit, LALAL.AI, and Moises on the same audio file.
Measures SDR and wall-clock latency for each.
"""

import time
import tempfile
from pathlib import Path

import requests
import librosa
import mir_eval
import numpy as np

from config import STEMSPLIT_API_KEY, LALALAI_API_KEY, MOISES_API_KEY


def download_file(url: str, dest_path: str) -> None:
    resp = requests.get(url, timeout=120)
    resp.raise_for_status()
    Path(dest_path).write_bytes(resp.content)


def compute_sdr(reference: str, estimated: str) -> float:
    ref, _ = librosa.load(reference, sr=44100, mono=True)
    est, _ = librosa.load(estimated, sr=44100, mono=True)
    n = min(len(ref), len(est))
    sdr, _, _, _ = mir_eval.separation.bss_eval_sources(
        ref[:n][np.newaxis, :], est[:n][np.newaxis, :]
    )
    return float(sdr[0])


def run_all(audio_path: str, reference_vocals: str) -> None:
    results = []

    with tempfile.TemporaryDirectory() as tmpdir:

        # --- StemSplit ---
        from api_stemsplit import stemsplit_separate, stemsplit_download
        t0 = time.time()
        stem_urls = stemsplit_separate(audio_path, STEMSPLIT_API_KEY, stems=2)
        local = stemsplit_download(stem_urls, output_dir=tmpdir)
        elapsed = time.time() - t0
        sdr = compute_sdr(reference_vocals, local["vocals"])
        results.append(("StemSplit", sdr, elapsed))

        # --- LALAL.AI ---
        from api_lalalai import lalalai_full_pipeline
        t0 = time.time()
        urls = lalalai_full_pipeline(audio_path, LALALAI_API_KEY, stem="vocals")
        vocals_path = Path(tmpdir) / "lalalai_vocals.wav"
        download_file(urls["stem"], str(vocals_path))
        elapsed = time.time() - t0
        sdr = compute_sdr(reference_vocals, str(vocals_path))
        results.append(("LALAL.AI", sdr, elapsed))

        # --- Moises (assumes audio_path is a hosted URL) ---
        # from api_moises import moises_separate, moises_poll
        # ... (requires hosted URL)

    print(f"\n{'Tool':<20} {'Vocal SDR':>10} {'Latency':>10}")
    print("-" * 42)
    for tool, sdr, latency in sorted(results, key=lambda x: -x[1]):
        print(f"{tool:<20} {sdr:>9.1f} dB {latency:>8.0f}s")


if __name__ == "__main__":
    run_all("song.mp3", "reference_vocals.wav")

Common Issues

"LALAL.AI returns 402 on my second call"

You've exceeded the free trial minutes. Unlike StemSplit where free credits accumulate, LALAL.AI's trial is a one-time pool.

"Moises job status stuck on PROCESSING"

Long files (>10 min) can take several minutes. Add a timeout to your polling loop:

def moises_poll(job_id: str, api_key: str, timeout_seconds: int = 600) -> dict:
    deadline = time.time() + timeout_seconds
    while time.time() < deadline:
        # ... same as before ...
        time.sleep(5)
    raise TimeoutError(f"Moises job {job_id} timed out after {timeout_seconds}s")

"SDR looks lower than the benchmark table above"

My numbers are averaged across 3 tracks. Individual tracks vary significantly — a heavily produced EDM track will score lower than a clean acoustic recording on all tools.

Summary

If I had to pick one for a new project today: StemSplit for anything needing 3+ stems, LALAL.AI if you only need 2-stem and want an alternative.

The quality difference between StemSplit and LALAL.AI is real (~0.9 dB) but won't matter for most applications. The pricing difference at scale is harder to ignore.

For context, if you want to run this yourself locally for free with the same model quality, the Demucs setup guide has everything you need. The full benchmark comparison including non-API tools is worth reading alongside this one.

Questions about integrating any of these APIs? Drop them in the comments. Especially curious if anyone has used Moises at scale and has workarounds for the URL-only input limitation.

Top comments (1)

LALAL.AI • Apr 1 • Edited

Hey, thanks for putting this together!
A few corrections from the LALAL.AI side:

1) Model. Orion is from 2022, while our current models are Perseus (2024) and Andromeda (2025) — would love to see them in your benchmark.

2) "One stem per API call" — not quite. API v1 has /split/multistem/ (up to 6 stems per request) and batch processing (up to 100 files per call).

3) API version. You tested our legacy v0 API. Current v1 (/api/v1/) has dedicated endpoints, output format selection, extraction quality modes, lead/back vocal separation, dereverb, batch processing, and supports up to 10 stem types.

4) Sample size. 3 tracks is too few to draw conclusions — standard benchmarks like MUSDB18 use 150. A ~0.9 dB gap could easily shift with different tracks. In SDR measurement, sample variance across tracks is large enough that you can get almost any number you want by choosing the right three songs. Standard evaluation sets like MUSDB18 exist precisely because the field learned this the hard way. This matters not just for the validity of the comparison, but as a signal about the rigor behind the tooling being recommended.

Our actual numbers, for context. Andromeda, our 2025 model, achieves 16 dB SDR on our internal evaluation set, a carefully constructed dataset representing hundreds of hours of music, built over roughly 5 person-years of work by audio and music specialists on selection, annotation, and cleaning, with controlled genre and instrumentation distribution. That's not a cherry-picked number on 3 tracks; it's a reproducible result on a rigorous dataset. The 8.7 dB figure in your table reflects a 4-year-old model tested on a handful of songs.

A few broader points worth flagging for your readers, especially those evaluating these tools for production use:

On HTDemucs and compliance. Using a community-trained open-source model like HTDemucs introduces real uncertainty for anyone who cares about the legal status of their outputs. The training data provenance isn't audited, and the licensing implications for processed results aren't clearly defined. For teams with any compliance requirements, such as licensing, publishing, broadcast, sync, this is worth thinking through carefully before going to production.

If StemSplit doesn't own or maintain the underlying model, it means the quality trajectory might be entirely outside their control, and by extension, outside yours as their customer. That's a different kind of dependency than working with a provider that owns and actively develops its models.

Happy to help set up a proper round-two benchmark with current models and a statistically meaningful test set.

DEV Community

AI Stem Splitter API Comparison 2026: StemSplit vs LALAL.AI vs Moises (With Benchmarks)

What I Compared

What You'll Learn

Test Setup

API 1: StemSplit

Authentication

Separate Stems

What Else You Get

API 2: LALAL.AI

Authenticate

Separate Stems

API 3: Moises

Benchmark Results

Vocal SDR (average across 3 tracks)

Latency Breakdown (4-minute song, average of 5 runs)

Quality Notes

Pricing Comparison

Developer Experience

StemSplit

LALAL.AI

Moises

Which Should You Use?

Full Comparison Script

Common Issues

"LALAL.AI returns 402 on my second call"

"Moises job status stuck on PROCESSING"

"SDR looks lower than the benchmark table above"

Summary

Related Articles

Top comments (1)