Originally published at ffmpeg-micro.com
FFmpeg audio normalization trips up even experienced developers. The loudnorm filter alone has 6 parameters, a two-pass workflow that requires parsing JSON from stderr, and behavior that changes depending on whether you feed it MP4, WAV, or MKV. Most guides skip the hard parts.
This post covers three approaches to fixing audio levels with FFmpeg: simple volume scaling, broadcast-standard loudnorm normalization (EBU R128), and a cloud API that handles it without installing anything.
Quick answer: normalize audio to -16 LUFS with FFmpeg
ffmpeg -i input.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy output.mp4
This applies EBU R128 loudnorm normalization targeting -16 LUFS (the standard for streaming platforms). The -c:v copy flag passes video through untouched so only the audio gets re-encoded.
Approach 1: Simple volume adjustment with the volume filter
The volume filter is the simplest way to change audio levels. It multiplies every sample by a fixed factor.
Halve the volume:
ffmpeg -i input.mp4 -af "volume=0.5" -c:v copy output.mp4
Double the volume:
ffmpeg -i input.mp4 -af "volume=2.0" -c:v copy output.mp4
Adjust by decibels (more precise):
ffmpeg -i input.mp4 -af "volume=6dB" -c:v copy output.mp4
The volume filter is predictable but dumb. It doesn't analyze the source audio first, so you can easily clip loud sections or amplify noise in quiet ones. For consistent levels across multiple files, you need actual normalization.
Approach 2: EBU R128 normalization with loudnorm
The loudnorm filter implements the EBU R128 broadcast standard. Instead of blindly scaling amplitude, it measures perceived loudness (LUFS) and adjusts dynamically.
Single-pass normalization (good enough for most cases):
ffmpeg -i input.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy output.mp4
What the parameters mean:
-
I=-16sets the target integrated loudness to -16 LUFS. YouTube uses -14, Spotify uses -14, podcasts typically use -16 to -19. -
TP=-1.5sets the true peak ceiling to -1.5 dBTP. This prevents clipping on lossy codecs like AAC and MP3 that can overshoot 0 dB during decoding. -
LRA=11sets the loudness range to 11 LU. This controls how much dynamic range the output keeps.
Two-pass normalization (more accurate):
Pass 1 - Analyze and capture stats:
ffmpeg -i input.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json" -f null - 2>&1 | tail -12
Pass 2 - Apply the measured values:
ffmpeg -i input.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-19.6:measured_TP=-5.9:measured_LRA=0.6:measured_thresh=-29.6:offset=0.1:linear=true" -c:v copy output.mp4
Which approach should you use?
| Scenario | Best approach | Why |
|---|---|---|
| Quick fix on one file | volume filter | Simple, predictable, no analysis needed |
| Batch normalize for a platform | loudnorm single-pass | EBU R128 standard, good enough for streaming |
| Mastering or archival quality | loudnorm two-pass | Most accurate, preserves dynamics |
| Automated pipeline | Cloud FFmpeg API | No installation, scales with your workflow |
Common pitfalls
Forgetting -c:v copy and re-encoding the entire video. Without it, FFmpeg re-encodes the video stream too. This takes 10-50x longer.
Using volume instead of loudnorm for batch processing. The volume filter applies the same gain to every file. loudnorm measures each file and targets a consistent output level.
Setting the true peak ceiling to 0 dBTP. Lossy codecs generate intersample peaks during decoding that can exceed the encoded peak by 1-3 dB.
Running loudnorm on already-normalized audio. Double-normalizing compresses dynamics further each pass. Check levels first.
FAQ
What LUFS level should I target for YouTube?
YouTube normalizes all audio to -14 LUFS. Targeting -14 means YouTube won't touch your audio. -16 is close enough that the difference is barely audible.
Can I normalize audio without re-encoding the video?
Yes. Use -c:v copy to pass the video stream through unchanged. Only the audio gets re-encoded.
What is the difference between loudnorm and dynaudnorm?
loudnorm targets a specific LUFS level per the EBU R128 standard. dynaudnorm adjusts volume frame-by-frame and can introduce pumping artifacts. For most use cases, loudnorm is the right choice.
Top comments (0)