Skip video normalization. Your AI pipeline will thank you.

#python #ffmpeg #webdev #ai

We had an ffmpeg normalization step at the start of our video processing pipeline. Every uploaded file got re-encoded to a standard format (H.264, 1080p, 30fps) before any AI analysis.

It seemed obviously correct. Standardize inputs, simplify downstream code, guarantee consistent behavior.

It was the most expensive mistake in our architecture.

The numbers

A user uploads 47 clips from a GoPro Hero 12. Total raw size: 12GB.

After normalization: 36GB. Three times larger. The re-encode expanded the files because our target bitrate was higher than GoPro's efficient HEVC encoding.

Processing time for normalization alone: 8 minutes. Before any actual AI work started.

Storage cost: we kept both raw and normalized copies "just in case." R2 bills tripled.

Why we removed it

We were building an AI video analysis pipeline using Gemini 2.5 Flash. The assumption was that Gemini needed normalized inputs.

We tested sending raw files directly. MP4, MOV, AVI, WebM, HEVC, H.264, different resolutions, different framerates.

Gemini handled all of them. Every format. Every resolution. No errors. No quality difference in the analysis output.

The normalization step existed because we assumed the AI needed clean inputs. It did not.

What changed

Before:

upload -> normalize (8 min, 3x storage) -> analyze -> segment -> render

After:

upload -> analyze -> segment -> render

Results:

Storage: 36GB down to 12GB per batch (70% reduction)
Processing: 8 minutes saved per import
Complexity: one fewer stage to maintain, debug, and monitor
R2 costs: dropped immediately

The lesson

Every pipeline has a step someone added "because it seemed right" that nobody questioned. For us it was normalization. For you it might be:

Resizing images before sending to a vision model (most handle arbitrary sizes)
Converting audio to WAV before analysis (most speech models accept MP3 natively)
Transcoding video before thumbnailing (ffmpeg can extract frames from any container)

The fix is always the same: test what happens when you remove the step. If downstream still works, the step was waste.

Technical details

Our ingest stage now does metadata extraction only: hash, probe (resolution, fps, codec, duration), orientation detection, and EXIF parsing. No file transformation.

# Before: normalize then analyze
normalized = await ffmpeg_normalize(raw_file, target="h264_1080p_30fps")
result = await gemini_analyze(normalized)

# After: skip straight to analysis  
metadata = await ffprobe_extract(raw_file)
result = await gemini_analyze(raw_file)  # Gemini handles any format