From ChatGPT System Prompt to a Music App

#music #ai #startup #llm

Three technical decisions from productizing a pattern I'd been using by hand.

For a while I'd been using ChatGPT as a music curator. Not a tool, not an app — a conversation. A system prompt describing the kind of listener I was and how I wanted to think about records. After listening to something I'd write a few sentences about how the record landed, paste them into the thread, and ask for a recommendation. It worked surprisingly well.

Two things bothered me about the setup.

First, the context window. After a few dozen albums and reflections, the thread got long enough that the model started forgetting earlier entries. The whole value of the approach was that the pattern across my reflections guided the next pick — and that pattern was the first thing to drop once the conversation grew.

Second, no structure. Free-form chat meant no history to browse, no way to see which album I'd written what about, no album metadata, no cover art, no links to open the record in Spotify or Tidal. Just a wall of text I'd scroll to find the next recommendation.

So I spent a few weeks building the productized version — an app called Acetate. Feedback history persisted in a database and sent verbatim to the LLM on every selection call. Album cards with cover art and streaming-service links. MusicBrainz metadata verification so the LLM can't propose a record that doesn't exist.

Three technical decisions shaped how the curator actually works.

1. No taste profile. The full feedback history goes straight to the LLM.

The conventional recommender-systems approach is to compress the user into a compact representation: a preference vector, a set of tags, a collaborative-filtering embedding. The model consumes the compressed form and produces a recommendation.

I'd already learned from the ChatGPT setup that compression was the wrong move for this problem. When I boiled my reflections down to keywords or tags before feeding them back in, the next recommendation was noticeably worse than when the model saw the raw text. The specific sentences — what I actually wrote, in my own words — carried signal that no summarization preserved.

So the Gemini selector prompt in Acetate hands it everything:

The list of past albums (title, artist, year, listening context)
Each reflection in full, in the user's original words
Any persistent preferences (era, geography, obscurity bands)
An optional one-time "directed discovery" hint ("I want something closer to jazz")

The instruction asks Gemini to reason about the pattern across the notes and propose three candidate albums with explicit reasoning for each.

The tradeoff is real — long feedback histories mean higher per-call token usage — but bounded. Most users stay under 100 albums, which is well within Gemini's context window even at verbose prose.

2. MusicBrainz verification chain. Never trust LLM metadata.

ChatGPT would happily propose albums with the wrong year, by the wrong artist, or invented entirely. The model is confident about music history it half-remembers, and in a recommendation flow that confidence is a landmine — you send a user to listen to a record that doesn't exist and the whole experience breaks.

The verification chain in Acetate works like this. Gemini proposes three candidates in order of preference. The backend queries MusicBrainz for each in turn:

for candidate in gemini_candidates:
    mb_match = musicbrainz.search(
        title=candidate.title,
        artist=candidate.artist,
    )
    if mb_match and mb_match.year == candidate.year:
        return mb_match  # clean match, use it
# all three failed verification
return fallback.well_regarded_in_territory(user.profile)

The first candidate that matches cleanly — correct title + artist + year + a valid MusicBrainz ID — wins. If none of the three match, the backend falls back to a curated well-regarded record in the user's explored territory rather than trusting the LLM's unverified output.

This means Gemini can be wrong in a safe way. It still guides the taste (what kind of record fits the user right now) but it doesn't get to decide whether the record actually exists.

MusicBrainz rate-limits anonymous clients to one request per second. The backend caches aggressively (30 days for album-level data, longer for artist-level) so the verification chain usually hits cache, not the API.

3. Cache insights on state change, not TTL.

Acetate has a page called Insights that runs a separate Gemini call to produce a 250–350 word editorial essay about the user's taste. It's expensive to generate, and users revisit the page more often than they add new albums.

The naive cache is a TTL: regenerate every N hours. This is wrong for this workload, because the essay is only stale when new data has been added. If a user hasn't listened to a new album in two weeks, the essay from two weeks ago is still exactly correct — no staleness.

So the cache keys on state change instead:

CREATE TABLE insight_cache (
    user_id UUID PRIMARY KEY,
    album_count INTEGER NOT NULL,
    cached_json JSONB NOT NULL,
    generated_at TIMESTAMP NOT NULL
);

On /insights request:

current_count = count_user_albums(user_id)
cache = get_cache(user_id)
if cache and cache.album_count == current_count:
    return cache.cached_json  # instant, zero LLM cost
else:
    fresh = gemini.generate_insights(user_id)
    upsert_cache(user_id, current_count, fresh)
    return fresh

Most visits hit cache — zero LLM cost, instant render. The generation cost is paid only when there's new material to reason about.

The edge case: a user changes their preference settings (exploration level, era range) without adding new albums. In theory the insights are stale. In practice the exploration profile shapes future recommendations more than it shapes past-pattern insights, so the simpler cache key was worth the minor inconsistency.

Closing

Acetate is live at acetate.studio — a subscription music discovery app with an LLM curator that reads what you write. $3.99/month, 7-day free trial, no credit card required.

If you've built LLM-backed recommenders and found different patterns — or if any of the three decisions above sound wrong — I'd love to hear about it in the comments.

One album at a time.