Kyle White

Posted on Apr 2

Building a YouTube-to-Shorts Pipeline With Node.js and FFmpeg

#node #ffmpeg #javascript #video

If you've ever watched a long-form YouTube video and thought "this 45-second segment would kill on Shorts," you already understand the core problem this pipeline solves. Manually trimming, reformatting, and uploading clips is brutal at scale. Let's build a Node.js pipeline that automates the entire process — download, detect, crop, encode, and output a vertical-ready clip.

This is the kind of automation that powers tools like ClipSpeedAI, which does all of this with AI-driven clip selection on top.

The Pipeline Overview

Download the YouTube video with yt-dlp
Extract a time segment with FFmpeg
Detect the "action region" (or use center-crop as a fallback)
Re-encode to 9:16 vertical at 1080x1920
Write the output file

Prerequisites

npm install fluent-ffmpeg execa
pip install yt-dlp

Make sure ffmpeg and ffprobe are on your PATH. On Ubuntu:

sudo apt install ffmpeg

Step 1: Download With yt-dlp

// downloader.js
import { execa } from 'execa';
import path from 'path';

export async function downloadVideo(youtubeUrl, outputDir) {
  const outputTemplate = path.join(outputDir, '%(id)s.%(ext)s');

  const { stdout } = await execa('yt-dlp', [
    '--format', 'bestvideo[height<=1080][ext=mp4]+bestaudio[ext=m4a]/best[height<=1080]',
    '--merge-output-format', 'mp4',
    '--output', outputTemplate,
    '--print', 'filename',
    youtubeUrl
  ]);

  return stdout.trim();
}

One important note: never route video downloads through proxies. The files are large and proxy bandwidth is expensive. Only proxy the metadata/info-json API calls if you need to avoid rate limits.

Step 2: Extract a Segment

// clipper.js
import ffmpeg from 'fluent-ffmpeg';

export function extractSegment(inputPath, outputPath, startTime, duration) {
  return new Promise((resolve, reject) => {
    ffmpeg(inputPath)
      .seekInput(startTime)
      .duration(duration)
      .outputOptions(['-c:v libx264', '-c:a aac', '-avoid_negative_ts make_zero'])
      .output(outputPath)
      .on('end', resolve)
      .on('error', reject)
      .run();
  });
}

Using .seekInput() before the input (input seeking) is much faster than output seeking because FFmpeg skips the packet decode entirely until it hits the target timestamp.

Step 3: Crop to 9:16

Here's where it gets interesting. For a 1920x1080 source, a 9:16 crop at full height would be 607x1080. But for Shorts, we want 1080x1920 — so we need to scale up.

// cropper.js
import ffmpeg from 'fluent-ffmpeg';

export function cropToVertical(inputPath, outputPath, cropX = null) {
  // For 1080p source: crop 607px wide, centered or at cropX
  const sourceWidth = 1920;
  const sourceHeight = 1080;
  const cropWidth = Math.floor(sourceHeight * (9 / 16)); // 607
  const x = cropX !== null ? cropX : Math.floor((sourceWidth - cropWidth) / 2);

  return new Promise((resolve, reject) => {
    ffmpeg(inputPath)
      .videoFilter([
        `crop=${cropWidth}:${sourceHeight}:${x}:0`,
        `scale=1080:1920:flags=lanczos`
      ])
      .outputOptions([
        '-c:v libx264',
        '-preset fast',
        '-crf 23',
        '-c:a aac',
        '-b:a 128k',
        '-movflags +faststart'
      ])
      .output(outputPath)
      .on('end', resolve)
      .on('error', reject)
      .run();
  });
}

The -movflags +faststart flag moves the moov atom to the front of the file, which is essential for streaming and preview loading in browser players.

Step 4: Wire It Together

// pipeline.js
import { downloadVideo } from './downloader.js';
import { extractSegment } from './clipper.js';
import { cropToVertical } from './cropper.js';
import path from 'path';
import fs from 'fs';

const TMP = '/tmp/clips';
fs.mkdirSync(TMP, { recursive: true });

async function processYouTubeToShort(youtubeUrl, startTime, duration, cropX = null) {
  console.log('Downloading...');
  const sourcePath = await downloadVideo(youtubeUrl, TMP);

  console.log('Extracting segment...');
  const segmentPath = path.join(TMP, `segment_${Date.now()}.mp4`);
  await extractSegment(sourcePath, segmentPath, startTime, duration);

  console.log('Cropping to vertical...');
  const outputPath = path.join(TMP, `short_${Date.now()}.mp4`);
  await cropToVertical(segmentPath, outputPath, cropX);

  // Cleanup segment
  fs.unlinkSync(segmentPath);

  console.log(`Done: ${outputPath}`);
  return outputPath;
}

// Example usage
processYouTubeToShort(
  'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
  '00:01:24',
  45,
  700 // crop starting at x=700
);

Adding Smart Crop Detection

For a basic center-crop fallback, what we have is fine. For intelligent crop detection — like following a speaker's face — you need a secondary analysis pass. That's where integrating something like MediaPipe or a frame-by-frame face detection step comes in.

The general pattern is: run face detection on a keyframe every N seconds, collect the bounding box centroids, then compute the median X position across the clip. This gives you a stable crop X that doesn't jitter.

async function getStableCropX(videoPath, fps = 1) {
  // Extract keyframes at 1fps to /tmp/frames/
  // Run face detection on each frame
  // Return median face center X, scaled to source resolution
  // (implementation depends on your detection model)
}

Tools like ClipSpeedAI handle this detection and crop targeting automatically, which is the production-grade version of what we've built here.

Performance Notes

Input seeking (seekInput before input) is 3-10x faster than output seeking for long videos
preset fast vs preset slow: about 2x speed difference with minimal quality delta at CRF 23
For batch jobs, use a queue (Bull + Redis) rather than running these concurrently — FFmpeg is CPU-bound and concurrent jobs will thrash each other

What's Next

This pipeline is the foundation. From here you can layer in:

GPT-4o-based clip scoring to find the best segments automatically
Whisper-based caption burning for caption overlays
A job queue for processing dozens of videos in parallel

If you want to skip building all of this yourself, ClipSpeedAI wraps the entire pipeline into a hosted API — worth checking out if you're building on top of YouTube content at scale.

The full code above is production-ready for single-file processing. Wire it into a Bull queue and you've got a scalable YouTube Shorts factory.

DEV Community