Audio is almost always the last thing to enter a project and the first thing users notice when it's wrong.
We have good workflows for most things now. Feature specs, design reviews, staged deploys, retrospectives. But the audio layer — especially for indie builders and small dev teams — still largely operates on vibes and last‑minute decisions. Someone finds a track at 11pm before launch, it doesn't feel quite right, ships anyway.
I've been experimenting with a different approach. It doesn't require new tooling infrastructure, doesn't add significant overhead, and the core shift is more of a mindset than a process. Here's what it looks like.
The problem isn't finding music, it's the timing
Audio is treated as a late-stage production task. That's where most of the pain comes from.
When you pick music after the edit is locked, after the UX is shipped, after the copy is frozen — you're looking for something that fits a context that was never designed with sound in mind. The result is compromise by default. You find the "least wrong" track in a stock library, drop it in, and move on.
The alternative isn't spending more time on audio. It's spending a small amount of time earlier.
What "earlier" looks like in practice
Concrete change: add one line to your feature spec or content brief.
Audio intent: [describe the emotional function, structural role, key constraints]
That's the whole intervention at the planning stage. It doesn't require a decision yet — it just forces the question onto the table before everything downstream gets locked.
Once the question is on the table, something interesting happens: you realise the audio spec is actually a proxy for a UX question you hadn't explicitly answered. What should users feel during this flow? What's the energy profile of this screen? If the spec can't answer those, the issue isn't audio — it's an underdefined experience.
From spec to track: where SonGo fits
Once you have a written audio intent — even a rough one — a tool like SonGo lets you generate a track from it directly. No library search, no compromising on "close enough."
You write what the track needs to do in natural language. SonGo generates one track from that description. If it's not quite right, you refine the words, not the waveform. The spec is the source; the track is the output.
On SonGo's paid plan, the generated track comes with commercial rights — meaning the same file that plays in your product can also be distributed to Spotify and Apple Music through a standard distributor. One generation session solves both the UX problem and the asset problem simultaneously.

Top comments (0)