Freqblog

Posted on May 8 • Originally published at freqblog.com

AcousticBrainz Alternative in 2026: The Honest Insider's Guide

#music #api #mir #python

If you were one of the developers, researchers, or hobbyists who built on AcousticBrainz, you already know the story. The Music Technology Group at Universitat Pompeu Fabra announced the shutdown on 16 February 2022, took the live API offline, and published the entire dataset as a one-time public dump. Four years later, there's still nothing exactly like it.

This post is the honest insider's view of the post-AB landscape — written by a team that uses the frozen AcousticBrainz dump in production right now, as one of four fallback layers in our music-features API. We know what works in the dump, what doesn't, and what the realistic alternatives look like in 2026.

What you actually lost

AcousticBrainz had three things going for it that nothing has fully replaced:

Free, public, scriptable. No API key, no rate limit, no sign-up. GET /api/v1/<mbid>/low-level returned ~120 fields per recording. Researchers could pull millions of rows for a paper without negotiating commercial terms.
MBID-keyed. Every track was identified by its MusicBrainz ID, the open community-maintained identifier. That meant data from AcousticBrainz could be joined cleanly to MusicBrainz, Discogs, ListenBrainz, lyrics databases — the whole open-data ecosystem.
Crowd-contributed. Anyone could run the AcousticBrainz client on their own audio collection and submit features back. The dataset grew from real personal libraries, not a label-licensed catalog.

None of the commercial replacements has all three. Most have none.

The frozen dump — still extremely useful

This is the under-appreciated fact: the entire AcousticBrainz dataset is still freely downloadable. The official dump page hosts both the high-level (mood, genre, instrument-detection) and low-level (BPM, key, MFCCs, ~120 descriptors) tarballs as of June 2022, plus per-month deltas up to the shutdown.

What you get:

~7.5M unique recordings by MBID, with one or more contributed analyses each
~120 low-level descriptors per track — spectral centroid, MFCCs, rhythm features, tonal features, dynamic complexity, all the numbers Essentia's MusicExtractor outputs
11 high-level classifier outputs per track — genre, mood (happy/sad/aggressive/relaxed/party), instrumentalness, danceability, etc.
The AcousticBrainz JSON schema — same one the live API used, so existing client code works against the dump with zero changes if you stand up your own static endpoint

Practical setup: serve the dump as a local API

The dump is a giant tarball of one-JSON-per-MBID. The cleanest pattern is to extract it into SQLite and serve via a thin wrapper:

# Pseudocode for the loader
import json, sqlite3, tarfile

conn = sqlite3.connect("ab_features.db")
conn.execute("""
    CREATE TABLE ab_features (
        mbid TEXT PRIMARY KEY,
        bpm REAL,
        key TEXT,
        scale TEXT,
        danceability REAL,
        average_loudness REAL,
        dynamic_complexity REAL,
        onset_rate REAL,
        tuning_frequency REAL,
        mood_happy REAL,
        mood_sad REAL,
        mood_aggressive REAL,
        mood_relaxed REAL,
        mood_party REAL,
        genre TEXT,
        instrumentalness REAL,
        full_json TEXT
    )
""")

with tarfile.open("acousticbrainz-lowlevel-features-20220623.tar.zst") as tar:
    for member in tar:
        if not member.name.endswith(".json"):
            continue
        data = json.loads(tar.extractfile(member).read())
        mbid = member.name.split("/")[-1].replace(".json", "")
        conn.execute("INSERT OR IGNORE INTO ab_features ...", (mbid, ...))

Expect the resulting SQLite to land around 2.2 GB once you've extracted the columns you actually use. Full-JSON-per-row blows it up to ~50 GB; only do that if you need every descriptor.

The dump is frozen at June 2022. Nothing released after that has values. For 2024-2026 releases, you'll need a live source. Use the dump as the historical layer of a tiered system — check it first, fall back to live analysis on miss.

The MBID problem

The dump is keyed by MBID, but most of your queries will arrive with artist + title strings, not MBIDs. Resolution is a separate problem:

Hit the live MusicBrainz API: GET /ws/2/recording/?query=artist:"<artist>" AND recording:"<title>"
Score the candidates — usually the first hit is right but featured-artist strings ("Mark Ronson featuring Bruno Mars") and remixes/covers cause noise
If you find a match, take the MBID and look it up in your AB dump table

Realistic hit rate from name-only resolution to AB-dump features: ~50%. Half your tracks will resolve cleanly, the other half will be misses for one of: no MBID assigned, MBID exists but track not in AB, featured-artist confusing the resolver, or a release after June 2022.

The live alternatives

Five categories, ranked from "closest to AB's spirit" to "closest in functionality":

1. Build your own with Essentia

AcousticBrainz was Essentia + crowd contributions. The same toolkit is open-source, actively maintained, and runs on a VPS. MusicExtractor takes a 30-second audio clip and returns the same ~120 fields AB stored. Caveat: you need the audio. The dump worked because users contributed local libraries; your replacement needs an audio source — iTunes 30-second previews work but rate-limit hard, full-track licenses cost money.

2. Self-host with the dump as primary, Essentia as fallback

This is what we do at FreqBlog. ab_features.db covers ~50% of inbound name-based queries via the MBID resolution path; for misses we run Essentia on iTunes preview clips and cache the result. The two layers complement each other — the dump catches anything pre-2022 that has an MBID; Essentia catches everything else with a commercial preview. Coverage hits ~85% in practice. The remaining 15% is bootlegs, demos, and obscurities with no preview anywhere.

3. Hosted music-feature APIs

The post-AB market split into two camps:

Spotify-shim APIs — Musicae and similar, designed to feel like the deprecated Spotify audio_features endpoint. Field names match Spotify's vocabulary. Useful if you're a Spotify-deprecation refugee, but they don't expose AB's ~120 low-level descriptors — just the ~11 Spotify-style high-level fields.
Catalog-style APIs — including FreqBlog (full disclosure: ours). Pass artist + title, get back BPM, key, energy, mood-vector, the four AB low-level descriptors we backfilled (onset_rate, dynamic_complexity, tuning_frequency, average_loudness), and standard cross-link IDs. Different ergonomics from AB but covers most practical use cases.

4. Apple Music API

Apple's catalog API exposes tempo, key, timeSignature, and a few mood/genre tags. Free for developer accounts ($99/year to ship). Doesn't expose danceability, energy, valence, or anything below the high-level surface — closer to AB's high-level layer than its low-level descriptors.

5. Re-run the analysis on labelled academic datasets

For research/non-commercial work where licensing matters, the FMA, Million Song Dataset, GTZAN, and similar academic corpora ship with audio that you can run Essentia on yourself. None match AB's coverage but all are legally clean for paper-publishing.

Field-mapping table

For migrating code that previously called the AcousticBrainz live API, here's roughly how the field names translate:

AcousticBrainz field	Frozen dump	FreqBlog	Essentia (DIY)
`rhythm.bpm`	✓	`bpm`	`RhythmExtractor2013.bpm`
`tonal.key_key`	✓	`key`	`KeyExtractor.key`
`tonal.key_scale`	✓	`mode`	`KeyExtractor.scale`
`highlevel.danceability`	✓	`danceability`	SVM model, deprecated
`highlevel.mood_happy`	✓	`mood_vector.happy`	SVM model, deprecated
`lowlevel.average_loudness`	✓	`average_loudness`	`Loudness`
`lowlevel.dynamic_complexity`	✓	`dynamic_complexity`	`DynamicComplexity`
`rhythm.onset_rate`	✓	`onset_rate`	`OnsetRate`
`tonal.tuning_frequency`	✓	`tuning_frequency`	`TuningFrequency`
`highlevel.genre_*` (multiple)	✓	`genre` (single)	SVM models, deprecated
`lowlevel.mfcc.mean[0..12]`	✓	not exposed	`MFCC`
`lowlevel.spectral_*` (~30 fields)	✓	not exposed	various spectral algorithms

The dump is the only option if you need the deep low-level vector (~120 fields). Hosted APIs typically expose the ~10-15 fields that map cleanly to product use cases.

What no replacement gives you back

The crowd contribution loop. AB grew because users ran the client on their personal libraries and submitted back. No commercial replacement has that bottom-up data flow — it's all top-down catalog ingestion.
The high-level mood/genre classifiers. MTG's own shutdown post acknowledged the high-level model quality wasn't reliable enough — that's part of why they killed AB. Essentia's documentation now flags those SVM models as deprecated. Reproducing them locally reproduces a known-broken system. If you need genre/mood at production-grade quality, the modern path is the Essentia Labs MusiCNN models (TensorFlow-based, much better trained), not the deprecated SVMs.
The MBID-keyed open-data ecosystem. AB joined cleanly to MusicBrainz/Discogs/ListenBrainz because everyone shared the MBID identifier. Commercial APIs key on their own internal IDs (or Spotify track IDs, which are now deprecated for new apps). Cross-linking is harder.

How to choose

Academic / research / one-shot dataset analysis → download the dump. It's free, complete through June 2022, and citable.
Building a product that needs MIR data on current releases → tier the dump under a live source (Essentia self-hosted, or a hosted API). Use the dump as the cheap-and-deep layer.
Building a quick app that just needs BPM/key/energy → a hosted catalog API (FreqBlog, Musicae, or Apple Music) is faster to integrate than running Essentia yourself.
Need the deep low-level vector (MFCCs, spectral centroid, etc.) on every track → only the dump or your own Essentia worker will give you those. No hosted API exposes them at retail prices.

Originally published at freqblog.com. If you've got a specific MIR migration question (resolver issues, dataset licensing, etc.), drop it in the comments.

DEV Community