We had an ffmpeg normalization step at the start of our video processing pipeline. Every uploaded file got re-encoded to a standard format (H.264, 1080p, 30fps) before any AI analysis.
It seemed obviously correct. Standardize inputs, simplify downstream code, guarantee consistent behavior.
It was the most expensive mistake in our architecture.
The numbers
A user uploads 47 clips from a GoPro Hero 12. Total raw size: 12GB.
After normalization: 36GB. Three times larger. The re-encode expanded the files because our target bitrate was higher than GoPro's efficient HEVC encoding.
Processing time for normalization alone: 8 minutes. Before any actual AI work started.
Storage cost: we kept both raw and normalized copies "just in case." R2 bills tripled.
Why we removed it
We were building an AI video analysis pipeline using Gemini 2.5 Flash. The assumption was that Gemini needed normalized inputs.
We tested sending raw files directly. MP4, MOV, AVI, WebM, HEVC, H.264, different resolutions, different framerates.
Gemini handled all of them. Every format. Every resolution. No errors. No quality difference in the analysis output.
The normalization step existed because we assumed the AI needed clean inputs. It did not.
What changed
Before:
upload -> normalize (8 min, 3x storage) -> analyze -> segment -> render
After:
upload -> analyze -> segment -> render
Results:
- Storage: 36GB down to 12GB per batch (70% reduction)
- Processing: 8 minutes saved per import
- Complexity: one fewer stage to maintain, debug, and monitor
- R2 costs: dropped immediately
The lesson
Every pipeline has a step someone added "because it seemed right" that nobody questioned. For us it was normalization. For you it might be:
- Resizing images before sending to a vision model (most handle arbitrary sizes)
- Converting audio to WAV before analysis (most speech models accept MP3 natively)
- Transcoding video before thumbnailing (ffmpeg can extract frames from any container)
The fix is always the same: test what happens when you remove the step. If downstream still works, the step was waste.
Technical details
Our ingest stage now does metadata extraction only: hash, probe (resolution, fps, codec, duration), orientation detection, and EXIF parsing. No file transformation.
# Before: normalize then analyze
normalized = await ffmpeg_normalize(raw_file, target="h264_1080p_30fps")
result = await gemini_analyze(normalized)
# After: skip straight to analysis
metadata = await ffprobe_extract(raw_file)
result = await gemini_analyze(raw_file) # Gemini handles any format
The raw file goes directly to Gemini. Segments get extracted on-demand from the source using seek + duration, not from pre-cut clips.
If you are building a video AI pipeline and your first step is re-encoding, try removing it. You might be surprised.
Top comments (0)