TL;DR
We normalize audio to a consistent perceived loudness using FFmpeg's
loudnormfilter in two passes: pass 1 measures, pass 2 applies a linear gain to the target. We'll parse the JSON, script it, batch it withffmpeg-normalize, and verify withebur128. Single-pass loudnorm pumps; don't use it for VOD.
If your library has one clip that whispers and the next one that blasts, your viewers are riding the volume knob. Peak normalization won't fix it (same peak, different perceived loudness). What you want is EBU R128 loudness normalization, and FFmpeg ships the filter to do it. Let's wire it up properly.
1. Why not single-pass?
The tempting one-liner is single-pass:
# DO NOT do this for VOD: it applies dynamic compression and pumps
ffmpeg -i in.mp4 -af loudnorm=I=-16:TP=-1.5:LRA=11 out.mp4
In one pass, loudnorm doesn't know what's coming, so it makes loudness decisions on the fly with dynamic processing. On music and dialogue that breathes audibly. Single-pass is for live; for files you process ahead of time, use two passes and a linear gain.
2. Pass 1: measure
Run loudnorm purely to analyze. Output goes to null; we only want the JSON it prints to stderr.
# bash: measure.sh
ffmpeg -hide_banner -i input.mp4 \
-af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json \
-f null - 2> measure.log
The tail of measure.log is a JSON block:
{
"input_i" : "-27.61",
"input_tp" : "-9.05",
"input_lra" : "8.40",
"input_thresh" : "-38.10",
"output_i" : "-16.00",
"output_tp" : "-1.50",
"output_lra" : "11.00",
"normalization_type" : "dynamic",
"target_offset" : "0.49"
}
Those input_* values are the measured loudness of your file. We feed them back in pass 2.
💡 Target choice:
-16LUFS is a sane general default. Broadcast wants-23(EBU R128) withTP=-1. Big streaming platforms normalize playback to roughly-14. Pick per context and keep it consistent across the library.
3. Pass 2: apply linear gain
Pass the measured values back and set linear=true. That applies one consistent gain across the whole file instead of moment-to-moment compression, so dynamics survive.
# bash: apply.sh
ffmpeg -hide_banner -i input.mp4 \
-af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-27.61:measured_TP=-9.05:measured_LRA=8.40:measured_thresh=-38.10:offset=0.49:linear=true \
-c:v copy -c:a aac -b:a 192k -ar 48000 \
output.mp4
⚠️
loudnormresamples internally to 192 kHz. If you don't set-ar 48000you can get a 192 kHz output you didn't want. Always pin the output sample rate.
Note -c:v copy: we're not touching the video, just re-encoding the audio track.
4. Script the whole thing
Hardcoding measured values is fine for one file, painful for a thousand. Here's a small wrapper that runs pass 1, parses the JSON, and runs pass 2.
# normalize.py: python 3.10+, ffmpeg 5.x+ on PATH
import json, re, subprocess, sys
from pathlib import Path
TARGET = dict(I="-16", TP="-1.5", LRA="11")
def measure(src: Path) -> dict:
cmd = ["ffmpeg", "-hide_banner", "-i", str(src),
"-af", f"loudnorm=I={TARGET['I']}:TP={TARGET['TP']}:LRA={TARGET['LRA']}:print_format=json",
"-f", "null", "-"]
out = subprocess.run(cmd, capture_output=True, text=True).stderr
# the JSON block is the last {...} in stderr
blob = re.search(r"\{[^{}]+\}\s*$", out.strip())
if not blob:
raise RuntimeError(f"no loudnorm JSON for {src}")
return json.loads(blob.group(0))
def apply(src: Path, dst: Path, m: dict) -> None:
af = (f"loudnorm=I={TARGET['I']}:TP={TARGET['TP']}:LRA={TARGET['LRA']}"
f":measured_I={m['input_i']}:measured_TP={m['input_tp']}"
f":measured_LRA={m['input_lra']}:measured_thresh={m['input_thresh']}"
f":offset={m['target_offset']}:linear=true")
cmd = ["ffmpeg", "-hide_banner", "-y", "-i", str(src),
"-af", af, "-c:v", "copy", "-c:a", "aac", "-b:a", "192k",
"-ar", "48000", str(dst)]
subprocess.run(cmd, check=True)
if __name__ == "__main__":
src = Path(sys.argv[1])
dst = src.with_name(src.stem + "_norm.mp4")
m = measure(src)
print(f"measured integrated loudness: {m['input_i']} LUFS -> target {TARGET['I']}")
apply(src, dst, m)
print(f"wrote {dst}")
$ python normalize.py whisper_clip.mp4
measured integrated loudness: -27.61 LUFS -> target -16
wrote whisper_clip_norm.mp4
5. Or just use ffmpeg-normalize
For batch jobs, ffmpeg-normalize wraps all of this with two-pass by default:
pip install ffmpeg-normalize
ffmpeg-normalize input.mp4 -nt ebu -t -16 -c:a aac -b:a 192k -ar 48000 -o output.mp4
# batch a folder
ffmpeg-normalize *.mp4 -nt ebu -t -16 -ext mp4 -o normalized/
-nt ebu selects EBU R128 (two-pass), -t -16 is the target. Same result, less plumbing.
6. Verify it worked
Don't trust, measure. The ebur128 filter prints the integrated loudness of the output:
ffmpeg -hide_banner -i output.mp4 -af ebur128 -f null - 2>&1 | tail -n 6
[Parsed_ebur128_0 @ ...] Summary:
Integrated loudness:
I: -16.0 LUFS
Threshold: -26.3 LUFS
True peak:
Peak: -1.5 dBFS
Integrated loudness at your target, true peak under the ceiling. Run it across a handful of clips and they'll all land at the same level.
⚠️ Edge case: very short or near-silent inputs report integrated loudness near
-70LUFS (the gate floor). Don't blindly amplify those to target, you'll just boost hiss. Skip or flag them.
7. Normalize once, mux into every rendition
If you also build an ABR ladder, you do not want to run loudnorm separately for each rung. Measure and normalize the audio once, then mux that single normalized track into every video rendition. The audio is identical across the ladder and it only got processed one time.
# bash: extract + normalize audio once
ffmpeg -hide_banner -i master.mov -vn -c:a pcm_s16le audio_raw.wav
# (run measure.sh + apply.sh on audio_raw.wav -> audio_norm.m4a, -ar 48000)
# mux the one normalized track into each silent video rendition
for r in 720 480 360; do
ffmpeg -hide_banner -y \
-i "video_${r}.mp4" -i audio_norm.m4a \
-map 0:v:0 -map 1:a:0 -c:v copy -c:a copy -shortest \
"rendition_${r}.mp4"
done
$ ffprobe -hide_banner rendition_720.mp4 2>&1 | grep -E 'Stream|LUFS'
Stream #0:0: Video: h264 ... 1280x720
Stream #0:1: Audio: aac ... 48000 Hz, stereo
Every rung now carries the same -16 LUFS audio. Switching renditions mid-playback won't change the perceived volume, which is the other half of a smooth listening experience.
What's next
- Wire
normalize.pyinto your upload pipeline as a worker step after transcode. - Pick the right target per content type: dialogue/podcast near
-16, broadcast-23, loud entertainment can sit higher. - If you also build a video ladder, run loudness on the audio track once and mux it into every rendition so they all match.
The filter has been stable in FFmpeg for years (any 5.x+ has it; the 8.0 line is current), so this isn't bleeding-edge. The win is treating loudness as a fixed pipeline setting you decide once and enforce, exactly like a resolution ladder.
Top comments (0)