Kyle White

Posted on Apr 2

Optimizing FFmpeg for Production: Settings That Cut Processing Time by 40%

#ffmpeg #video #node #javascript

FFmpeg's default settings are designed for output quality, not processing speed. On a production server where you're processing dozens of videos per hour, the difference between the right settings and the defaults can be the difference between a 60-second encode and a 100-second encode — at scale, that's the difference between serving 60 jobs/hour or 36 jobs/hour on the same hardware.

This post covers the exact FFmpeg settings that cut processing time by 40% in the pipeline at ClipSpeedAI, with benchmarks and the reasoning behind each change.

The Baseline

Default FFmpeg encode for a 60-second 1080p clip to 1080x1920 vertical:

ffmpeg -i input.mp4 -vf "crop=607:1080:660:0,scale=1080:1920" \
  -c:v libx264 -c:a aac output.mp4

Time: ~85 seconds on a 2-core server.

Target: under 55 seconds with acceptable quality loss.

Optimization 1: Encoder Preset

The most impactful single setting. libx264's preset controls the speed/compression tradeoff:

# Default (no preset specified) = medium
# Slow: best compression, slowest encode
# Medium: balanced default
# Fast: ~30% faster, ~5% larger file
# Faster: ~50% faster, ~10% larger file
# Veryfast: ~65% faster, ~15% larger file

For short-form video clips (30-90 seconds), file size isn't a significant concern. A 5-10MB clip vs a 7-12MB clip doesn't matter to the end user.

ffmpeg -i input.mp4 \
  -vf "crop=607:1080:660:0,scale=1080:1920" \
  -c:v libx264 \
  -preset fast \        # was: medium (default)
  -crf 23 \
  -c:a aac output.mp4

Improvement: ~28% faster (85s → 61s)

Optimization 2: Input Seeking

Placing -ss before -i vs after -i makes a massive difference for segment extraction:

# SLOW: output seeking (decodes everything from start)
ffmpeg -i input.mp4 -ss 45 -t 60 output.mp4

# FAST: input seeking (jumps to timestamp before decoding)
ffmpeg -ss 45 -i input.mp4 -t 60 output.mp4

Output seeking decodes every frame from 0 to the seek point. For a 30-minute video with a clip starting at minute 25, that's 25 minutes of decoding you're throwing away.

Input seeking isn't pixel-perfect (seeks to the nearest keyframe), but for most content the difference is imperceptible.

Improvement: Up to 10x faster for long source videos

// Node.js: always put -ss before -i
await execa('ffmpeg', [
  '-ss', String(startTime),   // BEFORE -i
  '-i', videoPath,
  '-t', String(duration),
  '-c:v', 'libx264',
  '-preset', 'fast',
  '-crf', '23',
  output
]);

Optimization 3: Audio Copy vs Re-encode

If the source audio is already AAC (which it is for most YouTube-downloaded MP4s), copy it instead of re-encoding:

# Re-encoding audio (unnecessary for AAC→AAC)
-c:a aac -b:a 128k

# Copy audio stream directly
-c:a copy

Caveat: Only use -c:a copy when the output container supports the source audio codec. MP4 container + AAC audio = always fine. If you're burning captions with the ASS filter, you must still copy audio (the filter only touches the video stream).

Improvement: ~5-8% faster, plus slightly better audio quality (no re-encode loss).

Optimization 4: Scale Filter with Faster Algorithm

The scale filter's default algorithm (bicubic) is slower than lanczos for upscaling and much slower than bilinear for downscaling:

# Default (bicubic)
scale=1080:1920

# For upscaling (lanczos is sharper but slower than bicubic)
scale=1080:1920:flags=lanczos

# For downscaling (bilinear is fast with minimal quality loss)
scale=1080:1920:flags=bilinear

For vertical video reframing from 1080p sources, you're upscaling a 607px-wide crop to 1080px — use lanczos for quality. For thumbnail generation where you're downscaling, use bilinear for speed.

Improvement: 3-7% depending on content complexity

Optimization 5: Disable Unused FFmpeg Features

FFmpeg processes streams it doesn't need by default. Explicitly disable:

# If you only want video (no audio processing):
-an

# If you only want audio:
-vn

# For image/thumbnail extraction:
-frames:v 1    # stop after 1 frame

For the segment extraction step (before caption burning), extract without re-encoding video at all — just cut:

# Ultra-fast segment extraction (no re-encode)
ffmpeg -ss 45 -i input.mp4 -t 60 -c copy segment.mp4

-c copy passes through both video and audio streams without re-encoding. This takes ~2 seconds regardless of segment length, vs 30-60 seconds for a full re-encode. The trade-off: you can only cut at keyframe boundaries, so the start point may be off by up to the keyframe interval (~2 seconds for most YouTube content).

For two-pass pipelines (extract then crop), this is the right approach — fast extract with -c copy, then a clean encode on the shorter segment.

Optimization 6: faststart Flag

Not a speed optimization for encoding, but critical for streaming performance:

-movflags +faststart

This moves the moov atom to the beginning of the MP4 file. Without it, a browser has to download the entire file before it can start playing. With it, playback starts immediately.

Required for any clip that users will preview or play in a web browser before downloading.

The Optimized Full Command

Combining all optimizations:

// lib/ffmpeg/encode_clip.js
import { execa } from 'execa';

export async function encodeVerticalClip({
  inputPath,
  outputPath,
  startTime,
  duration,
  cropX,
  sourceWidth = 1920,
  sourceHeight = 1080,
  assSubtitlePath = null
}) {
  const cropWidth = Math.floor(sourceHeight * (9 / 16));

  const videoFilters = [
    `crop=${cropWidth}:${sourceHeight}:${cropX}:0`,
    'scale=1080:1920:flags=lanczos'
  ];

  if (assSubtitlePath) {
    videoFilters.push(`ass=${assSubtitlePath}`);
  }

  const args = [
    '-ss', String(startTime),   // input seeking
    '-i', inputPath,
    '-t', String(duration),
    '-vf', videoFilters.join(','),
    '-c:v', 'libx264',
    '-preset', 'fast',
    '-crf', '23',
    '-c:a', 'copy',             // copy audio, don't re-encode
    '-movflags', '+faststart',
    '-avoid_negative_ts', 'make_zero',
    '-y',
    outputPath
  ];

  const start = Date.now();
  await execa('ffmpeg', args);

  const elapsed = Date.now() - start;
  console.log(`Encoded ${duration}s clip in ${elapsed}ms`);

  return outputPath;
}

Benchmark Summary

Setting	Change	Time Saved
preset: fast	was: medium	~28%
Input seeking	was: output seeking	Up to 10x for long videos
`-c:a copy`	was: re-encode AAC	~6%
`scale:flags=lanczos`	was: default	~4%
Combined		~40% total

Before optimizations: 85s average for a 60-second clip on a 2-core server.
After optimizations: 51s average.

At 10 clips per job and 20 jobs/hour, that's 5,600 seconds/hour of compute saved — enough to meaningfully reduce server costs or increase throughput at the same cost.

ClipSpeedAI processes every clip with these settings. The full encode pipeline described here processes a 60-second clip in approximately 45-55 seconds including face detection, caption burning, and upload — a number that only became achievable with these FFmpeg optimizations in place.

For the upstream pipeline that feeds clips into this encoder — YouTube download, Whisper transcription, and GPT-4o scoring — check out the other articles in this series. If you'd rather use the fully-optimized hosted version than implement it yourself, ClipSpeedAI puts all of these optimizations to work for you automatically.

DEV Community