How to Tell if a Song Was Made by AI

#music #musicdistribution #brazil #indie

How to Tell if a Song Was Made by AI

Suno, Udio, and Stable Audio produce tracks that fool the human ear in seconds — but leave spectral clues that forensic detectors identify with 97%+ precision. Here are the 7 most telling technical signals in 2026.

Why detecting AI music became a priority

In January 2025, Spotify removed 75 million spam tracks, most AI-generated. TikTok created the "AI-generated content" category in 2024 with reduced algorithmic reach. YouTube requires "Created with AI" label since 2024. In 2026, being identified as pure AI has practical consequences: lower monetization, cut reach, removal risk.

For independent artists, content creators, and producers who use AI as a tool, knowing whether a track will be detected has become part of the production workflow — exactly like mixing before sending to mastering.

Signal 1 — Spectral brickwall at 14-16 kHz

AI generation models almost always use a sharp low-pass filter between 14 kHz and 16 kHz to save computational cost. In a studio recording with a condenser microphone, the spectrum decays gradually up to 22 kHz. In Suno and Udio, it plummets vertically after 14 kHz — a signature visible even to the naked eye in a spectrogram.

Forensic detectors measure the rolloff above 14 kHz and calculate the slope of the drop. Slopes greater than 60 dB/octave are almost certainly AI.

Signal 2 — Invisible embedded watermarks

Since 2024, most commercial AI generation tools embed inaudible watermarks in the generated file. Suno uses SunoMark (periodic phase sequence in specific bands). Stability AI uses StableAudioMark (sub-Hz modulation on the side channel).

These markers are not removed by MP3 compression, normalization, or simple re-encoding. Forensic detectors perform auto-correlation analysis on the mid-side channel and identify the pattern in seconds.

Signal 3 — Extremely high spectral flatness

Music recorded by humans has dynamic variation between spectral bands — the vocal resonates at 200-3000 Hz, the guitar has body at 1-4 kHz, the kick at 60-200 Hz. Each instrument occupies its place.

In AI music, the model "fills" the entire spectrum uniformly from a statistical perspective. The spectral flatness (Wiener entropy) becomes abnormally high — usually between 0.15 and 0.30, versus 0.05-0.12 in human material.

Signal 4 — Absent F0 microflutter

The human voice never stays stable at a fundamental frequency (F0). Even on a sustained note, F0 fluctuates by ±3-15 cents per second (natural microflutter). Vibrato is an amplified version of this (±20-50 cents).

In AI-generated vocals, the F0 stays strangely rigid: variation less than ±1 cent when vibrato should be present. Detectors measure F0 derivative over time and identify when the pattern is too mechanical.

Signal 5 — Artificial HNR (Harmonic-to-Noise Ratio)

Human vocals have HNR between 15 and 25 dB in stressed syllables — a balance between harmonic component (vocal cord) and noise (breath, sibilance). Trained singers reach 28 dB in fortissimo.

AI tends to generate excessive HNR: 30-40 dB throughout entire syllables, without the natural breath noise. When there is "breathing," it is synthesized statistically — the detector recognizes it by the lack of spectral modulation characteristic of the human vocal tract.

Signal 6 — Subharmonics with low energy

Male human vocals generate subharmonics (components at sub-fundamental frequencies) through the M1/M2 mechanism of the larynx — especially on low notes with chest voice. Women have less, but still present above 200 Hz.

AI almost always fails to synthesize subharmonics with realistic energy. Spectral analysis below the F0 shows abnormal "emptiness" — a strong signal of algorithmic generation.

Signal 7 — Fingerprint embedding via pretrained networks

The final line of defense: neural networks trained on millions of human tracks (MERT, CLAP, EnCodec). The audio is converted to a 768-1024-dimensional embedding, compared against known AI vs. human clusters via classifiers like LightGBM or XGBoost.

The current MERT v3 model achieves F1 of 0.979 and AUC of 0.997 on a hold-out of 18,000 tracks. It means that of every 100 AIs, it identifies 98 without significant false positives.

How to test your music now

HUMANIZE combines all 7 signals in a single analysis. Upload MP3, WAV, FLAC, or M4A. In 3 to 15 seconds it returns the Played-by-Human (PbH) verdict with 70-99% confidence, or identifies it as AI, or Mix (hybrid track).

If your track comes back as AI and you want to distribute commercially, consider processing through the humanization pipeline — adds human elements (sub-percentile pitch, light time-stretch, forensic mastering) that improve aesthetic adherence and evade the algorithmic fingerprint.