Three decisions behind a music-to-curator matching score

#typescript #algorithms #music #webdev

I build OTONAMI, a pitch platform that connects independent
artists with music curators — playlist editors, radio DJs, bloggers, label scouts.
At its core is a single number: how well does this track fit this curator?

The math behind that number is textbook. Cosine similarity, Jaccard, a weighted
sum — nothing you can't find in a first-year course. What actually took real,
messy music data to get right were three design decisions. Each one came from
a concrete failure, and each one is the difference between a matcher that looks
fine in a demo and one that ranks sensibly in production.

I extracted and generalized the engine into a small, typed, open-source library —
music-matching-patterns —
so the code below is real and runnable. Here are the three decisions.

The shape of the problem

A match is scored on three factors and combined with weights:

score = genreScore · w_genre  +  moodScore · w_mood  +  audioScore · w_audio

Each sub-score lands in [0, 1]. Genre tends to be the strongest predictor of
fit, mood is a secondary signal, and audio is a tie-breaker rather than a driver —
so a split that leans on genre (something like 0.55 / 0.25 / 0.20) is a reasonable
starting point. The interesting part isn't the weights. It's how each sub-score is
computed.

Decision 1 — Genre uses recall, not Jaccard

The obvious move is Jaccard similarity over the two genre sets: intersection over
union. It's symmetric and tidy. It's also wrong here.

Picture a curator who covers ten genres — a generous, broad-taste editor. A track
that hits exactly one of those ten is a perfect fit for that curator's lane. But
Jaccard would score it 1 / 10 = 0.1, because the union is huge. The broader and
more welcoming the curator, the harder Jaccard punishes them. That's exactly
backwards from what you want.

The question isn't symmetric. It's: does this track fit inside the curator's
lane? So genre is scored as recall over the track's genres — of the track's
genres, how many does the curator cover?

export function genreScore(track: Track, curator: Curator): number {
  if (curator.openToAllGenres) return 1;

  const trackGenres = normalizeLabels(track.genres);
  if (trackGenres.length === 0) return 0.5; // see Decision 3

  const curatorGenres = new Set(normalizeLabels(curator.genres));
  const covered = trackGenres.filter((g) => curatorGenres.has(g)).length;
  return covered / trackGenres.length;
}

A broad curator is no longer penalized for being broad. The asymmetry of the
real-world question is now baked into the metric.

Decision 2 — Tempo is deliberately excluded from the audio vector

Audio fit is cosine similarity over a feature vector: energy, danceability,
acousticness, instrumentalness, valence. The tempting sixth dimension is tempo.
I left it out on purpose.

Automatic BPM detection is unreliable in a way that's uniquely destructive to a
distance metric: it makes half-time and double-time errors. A slow 60 BPM
ballad routinely gets read as 120 BPM. When that doubled value lands in a vector
and you compute distance, it doesn't just add a little noise — it blows a hole in
the score. Two tracks that belong together suddenly look far apart on one axis.

In an early version, this produced a bug where about 22% of matches collapsed
toward a flat, meaningless score. Tracing it back, tempo was the culprit nearly
every time. Pulling tempo out of scoring removed an entire class of false
negatives at once.

for (const dimension of AUDIO_DIMENSIONS) {
  const x = a[dimension];
  const y = b[dimension];
  dot += x * y;
  magA += x * x;
  magB += y * y;
}

Tempo can still show up in display copy ("similar energy and tempo") — humans
read it fine. It just must never re-enter the score. Keep explanation and scoring
decoupled.

Decision 3 — Missing data is neutral, never a penalty

Independent music is full of gaps. Plenty of tracks have no reliable audio
analysis. Plenty of curators never filled in their mood tags. The naive thing is
to treat a missing signal as a zero — and the naive thing quietly buries every new
or under-analyzed artist at the bottom of every ranking.

So whenever either side lacks audio, or either side lacks moods, that factor
returns a neutral 0.5. It neither helps nor hurts.

export function audioScore(track: Track, curator: Curator): number {
  const a = track.audio;
  const b = curator.audio;
  if (!a || !b) return 0.5; // absence of a signal is not evidence of a bad fit
  // ...cosine similarity...
}

Absence of evidence is not evidence of a poor fit. Encoding that one line keeps the
newest artists — the ones a discovery platform exists to serve — from being
penalized for thin metadata.

Takeaway

None of these are clever algorithms. They're small, boring guards: recall instead
of Jaccard, one dimension removed, one neutral fallback. But each one came from
watching real rankings go wrong, and together they're most of what separates a
matcher that works from one that merely runs.

The full implementation is open source and typed end to end:
music-matching-patterns.
If you're wiring an LLM into a Next.js app, you might also like my earlier write-up
on production patterns for the Claude API in Next.js.