DEV Community

ShaiZadok
ShaiZadok

Posted on • Originally published at acetaggen.com

I built a 100-point prompt scorer for SUNO AI — 16 checks, open-source on npm

Why deterministic prompt scoring?

A few months ago I was using SUNO AI and kept regenerating the same song idea 20-30 times before getting something close to what I imagined. The prompt syntax felt opaque. Genre close but sub-genre missed. Mood right but vocals wrong.

Turns out SUNO's prompt behavior is actually deterministic enough to score. So I wrote one: suno-prompt-scorer on npm (MIT).

What the scorer checks — 16 signals

Each check is weighted; total is a percentage 0-100:

# Check Category Weight
1 Character limit (v4: 200, v4.5+: 1000) style 8%
2 Genre collisions (53 known pairs) style 10%
3 Weak token detection (context-aware) style 8%
4 Strong token reward (48 hardware/prod anchors) style 8%
5 Tag ordering weight 2/(1+k) style 8%
6 Genre in position 1 style 7%
7 Mood in position 2 style 5%
8 Per-category limits (genre 1-2, mood 1-2, instruments 2-3) style 8%
9 Invalid tag detection (49 known bad) style 8%
10 Suspicious tag detection (28 unverified) style 4%
11 Misclassified subgenres (100 mapped) style 5%
12 Bracket syntax validation lyrics 7%
13 Regional coherence advanced 4%
14 Version-specific warnings advanced 5%
15 Ready-package proximity to benchmarks style 5%
16 Bracket verbatim check (informational, weight 0) lyrics

The anchor-based philosophy

The most interesting design decision: separate core nouns (must be verified) from modifiers (creative freedom).

So [shofar blast] passes — "Shofar" is a verified instrument, "blast" is a free modifier. But [QuantumSynth breakdown] fails — no verified anchor.

This preserves creativity (real producers combine real instruments in unexpected ways) while catching hallucinations.

// Core nouns: genres, instruments, keys, vocal types → verbatim
// Structural: [Intro], [Verse], [Chorus], [Drop] → verbatim
// Modifiers: blast, crystalline, thundering → free
Enter fullscreen mode Exit fullscreen mode

What I learned about SUNO

Building this surfaced several non-obvious findings:

  1. Position 1 is 60-70% of output DNA. The first tag dominates.
  2. Tag weight drops 2/(1+k) per position. Position 6 has ~30% of position 1's weight.
  3. Some modifiers are weak in isolation but strong in context. "Modern" alone is weak, "polished modern production" is specific.
  4. V4.5+ supports 1,000 chars in Style, not 200. The 200-char limit was V4 only — still common misconception.
  5. Collision pairs aren't obvious. "calm + aggressive" is easy; "minimal + orchestral" and "whisper + powerful vocals" are less so.

Usage

npm install suno-prompt-scorer
Enter fullscreen mode Exit fullscreen mode
import { scorePrompt } from 'suno-prompt-scorer';

const result = scorePrompt(
  "Electropop, 128 BPM, 808 Bass, Moog bass, Confident, Euphoric",
);
console.log(result.total);        // 99
console.log(result.breakdown);    // 16 checks with pct + message
Enter fullscreen mode Exit fullscreen mode

Links

Contributions welcome

The knowledge base (4,000+ verified tags across 13 categories) is the main area where contributions help most — especially regional genres, edge cases, and emerging subgenres.


Disclosure: I'm the creator of AceTagGen. The scorer npm package is a standalone MIT-licensed extraction of the scoring engine. The web tool at acetaggen.com uses the same engine with a larger server-side knowledge base.

— Shai Zadok

Top comments (0)