DEV Community

SonGo
SonGo

Posted on

The Smallest Audio Spec That Still Changes Your Video

Try SonGo free for 3 days

You finish the edit. You lock the cut. Then you open a stock library and spend 45 minutes finding something that's "fine."

The video ships. It's fine. Not quite what you pictured, but fine.

The problem isn't the library. It's that by the time you opened it, every creative decision in the video was already made — pacing, energy, tone — and none of them considered audio. You're not choosing a soundtrack. You're patching a gap.

One small habit fixes this. It takes 90 seconds and it lives at the bottom of the brief you're already writing.


The template

Audio intent:
1. End emotion:
2. Role in this piece:
3. Hard NOs:
Enter fullscreen mode Exit fullscreen mode

Three lines. Here's what each one actually means.


Line 1: End emotion

Not a genre. Not a vibe word. The specific feeling you want the viewer to have at the end of this video.

Finish this sentence: "After watching this, I want them to feel ___."

Prompts that don't work:

  • "Happy" — points at ten thousand tracks
  • "Chill" — same problem
  • "Inspiring" — the most overloaded word in stock music

Prompts that actually narrow the space:

  • "Quiet confidence — like they can start this tonight"
  • "Slightly restless, like they need to go try this right now"
  • "The specific relief of finishing something that almost broke them"

The more specific and unusual the description, the further the generation moves from the generic center. That's the goal.


Line 2: Role in this piece

Music has a structural job in every video. Writing it down forces you to be honest about what you're actually asking it to do.

Four common roles:

  • Support under VO — invisible, just holding tension, never competing with speech
  • Montage driver — the pulse of a sequence, louder and more rhythmic
  • Tone setter — heard 3–5 seconds at the top before you say anything, then sits back
  • Breathing space — lower energy transition between two dense sections

One sentence is enough:

  • "Sits quietly under voiceover the whole time. Never the main character."
  • "Drives a 45-second b-roll section. Can be more present and hooky."
  • "Sets tone in the first 5 seconds, then disappears."

This single line eliminates entire categories of tracks that otherwise look fine until you try them against a real edit.


Line 3: Hard NOs

The highest-leverage line in the spec. Almost everyone skips it.

Hard NOs do two things at once: they cut what you don't want, and they push any generator away from the statistical center of whatever genre cluster you pointed at.[2][1]

Useful NOs for content creators:

  • "No vocals, no lyrics, no spoken word"
  • "No risers, whooshes, or DJ-style transitions"
  • "No obvious loop restart"
  • "No melodic hook stronger than the message I'm delivering"
  • "No 'corporate inspirational' acoustic guitar"
  • "No dynamic jumps that don't match my cuts"[3][1]

Each NO shrinks the solution space. The more specific the NOs, the more distinctive the output.


What this looks like in practice

A real spec for a 5-minute tutorial — mostly talking head and screen capture:

Audio intent:
1. End emotion: Calm confidence. "I can set this up tonight."
2. Role: Under voiceover the whole time. Never the main character.
3. Hard NOs: No vocals. No heavy bass. No builds or drops.
   No "corporate inspirational" energy.
Enter fullscreen mode Exit fullscreen mode

30 seconds of writing. Here's what it buys you:

With a stock library: you skip 90% of results in the first 5 seconds. No more scrolling through 200 similar tracks. You know exactly what you're ruling out.

With an AI generator like SonGo: you paste those four lines as a brief. SonGo generates one track from that description — not a playlist, one track to evaluate against your spec. If it misses, you fix one line and regenerate. Usually it's the emotion line (too vague) or a missing hard NO.


[📸 SCREENSHOT — место 1]

Что: интерфейс SonGo с заполненным брифом (три строки Audio intent:) и waveform сгенерированного трека.

Подпись: "Thirty seconds of writing → one track built for this specific video."


How to make this a habit

Writing a brief feels like extra work. Opening a library feels like starting.

That's the trap. The fix: don't make it a separate task.

  • If you write a script → Audio intent: goes at the bottom of the script doc
  • If you use a shot list → it's a required field, same as VO notes
  • If you work in Notion or Linear → it's in the production template, not in a separate audio card

The brief gets written as part of the content process — not as an audio task that competes with it for attention.


Why this compounds

After 10–15 videos, something useful happens. Your Audio intent: entries start to repeat.

The same two or three end emotions. The same structural roles. The same NOs.

That's not laziness — that's your sonic identity clarifying itself. The audio register of your content starts to feel consistent across videos, not because you designed a "sound bible," but because you kept writing down the same honest answers to the same three questions.[8][7]

If you're generating with SonGo on a paid plan: every brief you write is also documentation for a track you own commercially. The same brief that worked for this video works for the next one with the same feel. Your brief log becomes a reusable audio library — built as a byproduct of normal content production, without a single extra session.


Try SonGo free for 3 days

Top comments (0)