Mason K

Posted on Jun 3

Two-pass loudness normalization with FFmpeg loudnorm (the right way)

#ffmpeg #video #tutorial #python

TL;DR

We normalize audio to a consistent perceived loudness using FFmpeg's loudnorm filter in two passes: pass 1 measures, pass 2 applies a linear gain to the target. We'll parse the JSON, script it, batch it with ffmpeg-normalize, and verify with ebur128. Single-pass loudnorm pumps; don't use it for VOD.

If your library has one clip that whispers and the next one that blasts, your viewers are riding the volume knob. Peak normalization won't fix it (same peak, different perceived loudness). What you want is EBU R128 loudness normalization, and FFmpeg ships the filter to do it. Let's wire it up properly.

1. Why not single-pass?

The tempting one-liner is single-pass:

# DO NOT do this for VOD: it applies dynamic compression and pumps
ffmpeg -i in.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 out.mp4

In one pass, loudnorm doesn't know what's coming, so it makes loudness decisions on the fly with dynamic processing. On music and dialogue that breathes audibly. Single-pass is for live; for files you process ahead of time, use two passes and a linear gain.

2. Pass 1: measure

Run loudnorm purely to analyze. Output goes to null; we only want the JSON it prints to stderr.

# bash: measure.sh
ffmpeg -hide_banner -i input.mp4 \
  -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json \
  -f null - 2> measure.log

The tail of measure.log is a JSON block:

{
  "input_i" : "-27.61",
  "input_tp" : "-9.05",
  "input_lra" : "8.40",
  "input_thresh" : "-38.10",
  "output_i" : "-16.00",
  "output_tp" : "-1.50",
  "output_lra" : "11.00",
  "normalization_type" : "dynamic",
  "target_offset" : "0.49"
}

Those input_* values are the measured loudness of your file. We feed them back in pass 2.

💡 Target choice: -16 LUFS is a sane general default. Broadcast wants -23 (EBU R128) with TP=-1. Big streaming platforms normalize playback to roughly -14. Pick per context and keep it consistent across the library.

3. Pass 2: apply linear gain

Pass the measured values back and set linear=true. That applies one consistent gain across the whole file instead of moment-to-moment compression, so dynamics survive.

# bash: apply.sh
ffmpeg -hide_banner -i input.mp4 \
  -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-27.61:measured_TP=-9.05:measured_LRA=8.40:measured_thresh=-38.10:offset=0.49:linear=true \
  -c:v copy -c:a aac -b:a 192k -ar 48000 \
  output.mp4

⚠️ loudnorm resamples internally to 192 kHz. If you don't set -ar 48000 you can get a 192 kHz output you didn't want. Always pin the output sample rate.

Note -c:v copy: we're not touching the video, just re-encoding the audio track.

4. Script the whole thing

Hardcoding measured values is fine for one file, painful for a thousand. Here's a small wrapper that runs pass 1, parses the JSON, and runs pass 2.

# normalize.py: python 3.10+, ffmpeg 5.x+ on PATH
import json, re, subprocess, sys
from pathlib import Path

TARGET = dict(I="-16", TP="-1.5", LRA="11")

def measure(src: Path) -> dict:
    cmd = ["ffmpeg", "-hide_banner", "-i", str(src),
           "-af", f"loudnorm=I={TARGET['I']}:TP={TARGET['TP']}:LRA={TARGET['LRA']}:print_format=json",
           "-f", "null", "-"]
    out = subprocess.run(cmd, capture_output=True, text=True).stderr
    # the JSON block is the last {...} in stderr
    blob = re.search(r"\{[^{}]+\}\s*$", out.strip())
    if not blob:
        raise RuntimeError(f"no loudnorm JSON for {src}")
    return json.loads(blob.group(0))

def apply(src: Path, dst: Path, m: dict) -> None:
    af = (f"loudnorm=I={TARGET['I']}:TP={TARGET['TP']}:LRA={TARGET['LRA']}"
          f":measured_I={m['input_i']}:measured_TP={m['input_tp']}"
          f":measured_LRA={m['input_lra']}:measured_thresh={m['input_thresh']}"
          f":offset={m['target_offset']}:linear=true")
    cmd = ["ffmpeg", "-hide_banner", "-y", "-i", str(src),
           "-af", af, "-c:v", "copy", "-c:a", "aac", "-b:a", "192k",
           "-ar", "48000", str(dst)]
    subprocess.run(cmd, check=True)

if __name__ == "__main__":
    src = Path(sys.argv[1])
    dst = src.with_name(src.stem + "_norm.mp4")
    m = measure(src)
    print(f"measured integrated loudness: {m['input_i']} LUFS -> target {TARGET['I']}")
    apply(src, dst, m)
    print(f"wrote {dst}")

$ python normalize.py whisper_clip.mp4
measured integrated loudness: -27.61 LUFS -> target -16
wrote whisper_clip_norm.mp4

5. Or just use ffmpeg-normalize

For batch jobs, ffmpeg-normalize wraps all of this with two-pass by default:

pip install ffmpeg-normalize
ffmpeg-normalize input.mp4 -nt ebu -t -16 -c:a aac -b:a 192k -ar 48000 -o output.mp4
# batch a folder
ffmpeg-normalize *.mp4 -nt ebu -t -16 -ext mp4 -o normalized/

-nt ebu selects EBU R128 (two-pass), -t -16 is the target. Same result, less plumbing.

6. Verify it worked

Don't trust, measure. The ebur128 filter prints the integrated loudness of the output:

ffmpeg -hide_banner -i output.mp4 -af ebur128 -f null - 2>&1 | tail -n 6

[Parsed_ebur128_0 @ ...] Summary:
  Integrated loudness:
    I:         -16.0 LUFS
    Threshold: -26.3 LUFS
  True peak:
    Peak:       -1.5 dBFS

Integrated loudness at your target, true peak under the ceiling. Run it across a handful of clips and they'll all land at the same level.

⚠️ Edge case: very short or near-silent inputs report integrated loudness near -70 LUFS (the gate floor). Don't blindly amplify those to target, you'll just boost hiss. Skip or flag them.

7. Normalize once, mux into every rendition

If you also build an ABR ladder, you do not want to run loudnorm separately for each rung. Measure and normalize the audio once, then mux that single normalized track into every video rendition. The audio is identical across the ladder and it only got processed one time.

# bash: extract + normalize audio once
ffmpeg -hide_banner -i master.mov -vn -c:a pcm_s16le audio_raw.wav
# (run measure.sh + apply.sh on audio_raw.wav -> audio_norm.m4a, -ar 48000)

# mux the one normalized track into each silent video rendition
for r in 720 480 360; do
  ffmpeg -hide_banner -y \
    -i "video_${r}.mp4" -i audio_norm.m4a \
    -map 0:v:0 -map 1:a:0 -c:v copy -c:a copy -shortest \
    "rendition_${r}.mp4"
done

$ ffprobe -hide_banner rendition_720.mp4 2>&1 | grep -E 'Stream|LUFS'
  Stream #0:0: Video: h264 ... 1280x720
  Stream #0:1: Audio: aac ... 48000 Hz, stereo

Every rung now carries the same -16 LUFS audio. Switching renditions mid-playback won't change the perceived volume, which is the other half of a smooth listening experience.

What's next

Wire normalize.py into your upload pipeline as a worker step after transcode.
Pick the right target per content type: dialogue/podcast near -16, broadcast -23, loud entertainment can sit higher.
If you also build a video ladder, run loudness on the audio track once and mux it into every rendition so they all match.

The filter has been stable in FFmpeg for years (any 5.x+ has it; the 8.0 line is current), so this isn't bleeding-edge. The win is treating loudness as a fixed pipeline setting you decide once and enforce, exactly like a resolution ladder.

DEV Community