I built a 100-point prompt scorer for SUNO AI — 16 checks, open-source on npm

#ai #opensource #npm #music

Why deterministic prompt scoring?

A few months ago I was using SUNO AI and kept regenerating the same song idea 20-30 times before getting something close to what I imagined. The prompt syntax felt opaque. Genre close but sub-genre missed. Mood right but vocals wrong.

Turns out SUNO's prompt behavior is actually deterministic enough to score. So I wrote one: suno-prompt-scorer on npm (MIT).

What the scorer checks — 16 signals

Each check is weighted; total is a percentage 0-100:

#	Check	Category	Weight
1	Character limit (v4: 200, v4.5+: 1000)	style	8%
2	Genre collisions (53 known pairs)	style	10%
3	Weak token detection (context-aware)	style	8%
4	Strong token reward (48 hardware/prod anchors)	style	8%
5	Tag ordering weight `2/(1+k)`	style	8%
6	Genre in position 1	style	7%
7	Mood in position 2	style	5%
8	Per-category limits (genre 1-2, mood 1-2, instruments 2-3)	style	8%
9	Invalid tag detection (49 known bad)	style	8%
10	Suspicious tag detection (28 unverified)	style	4%
11	Misclassified subgenres (100 mapped)	style	5%
12	Bracket syntax validation	lyrics	7%
13	Regional coherence	advanced	4%
14	Version-specific warnings	advanced	5%
15	Ready-package proximity to benchmarks	style	5%
16	Bracket verbatim check (informational, weight 0)	lyrics	—

The anchor-based philosophy

The most interesting design decision: separate core nouns (must be verified) from modifiers (creative freedom).

So [shofar blast] passes — "Shofar" is a verified instrument, "blast" is a free modifier. But [QuantumSynth breakdown] fails — no verified anchor.

This preserves creativity (real producers combine real instruments in unexpected ways) while catching hallucinations.

// Core nouns: genres, instruments, keys, vocal types → verbatim
// Structural: [Intro], [Verse], [Chorus], [Drop] → verbatim
// Modifiers: blast, crystalline, thundering → free

What I learned about SUNO

Building this surfaced several non-obvious findings:

Position 1 is 60-70% of output DNA. The first tag dominates.
Tag weight drops 2/(1+k) per position. Position 6 has ~30% of position 1's weight.
Some modifiers are weak in isolation but strong in context. "Modern" alone is weak, "polished modern production" is specific.
V4.5+ supports 1,000 chars in Style, not 200. The 200-char limit was V4 only — still common misconception.
Collision pairs aren't obvious. "calm + aggressive" is easy; "minimal + orchestral" and "whisper + powerful vocals" are less so.

Usage

npm install suno-prompt-scorer

import { scorePrompt } from 'suno-prompt-scorer';

const result = scorePrompt(
  "Electropop, 128 BPM, 808 Bass, Moog bass, Confident, Euphoric",
);
console.log(result.total);        // 99
console.log(result.breakdown);    // 16 checks with pct + message

Contributions welcome

The knowledge base (4,000+ verified tags across 13 categories) is the main area where contributions help most — especially regional genres, edge cases, and emerging subgenres.

Disclosure: I'm the creator of AceTagGen. The scorer npm package is a standalone MIT-licensed extraction of the scoring engine. The web tool at acetaggen.com uses the same engine with a larger server-side knowledge base.

— Shai Zadok