DEV Community

Kyle White
Kyle White

Posted on

Building a YouTube-to-Shorts Pipeline With Node.js and FFmpeg

If you've ever watched a long-form YouTube video and thought "this 45-second segment would kill on Shorts," you already understand the core problem this pipeline solves. Manually trimming, reformatting, and uploading clips is brutal at scale. Let's build a Node.js pipeline that automates the entire process — download, detect, crop, encode, and output a vertical-ready clip.

This is the kind of automation that powers tools like ClipSpeedAI, which does all of this with AI-driven clip selection on top.

The Pipeline Overview

  1. Download the YouTube video with yt-dlp
  2. Extract a time segment with FFmpeg
  3. Detect the "action region" (or use center-crop as a fallback)
  4. Re-encode to 9:16 vertical at 1080x1920
  5. Write the output file

Prerequisites

npm install fluent-ffmpeg execa
pip install yt-dlp
Enter fullscreen mode Exit fullscreen mode

Make sure ffmpeg and ffprobe are on your PATH. On Ubuntu:

sudo apt install ffmpeg
Enter fullscreen mode Exit fullscreen mode

Step 1: Download With yt-dlp

// downloader.js
import { execa } from 'execa';
import path from 'path';

export async function downloadVideo(youtubeUrl, outputDir) {
  const outputTemplate = path.join(outputDir, '%(id)s.%(ext)s');

  const { stdout } = await execa('yt-dlp', [
    '--format', 'bestvideo[height<=1080][ext=mp4]+bestaudio[ext=m4a]/best[height<=1080]',
    '--merge-output-format', 'mp4',
    '--output', outputTemplate,
    '--print', 'filename',
    youtubeUrl
  ]);

  return stdout.trim();
}
Enter fullscreen mode Exit fullscreen mode

One important note: never route video downloads through proxies. The files are large and proxy bandwidth is expensive. Only proxy the metadata/info-json API calls if you need to avoid rate limits.

Step 2: Extract a Segment

// clipper.js
import ffmpeg from 'fluent-ffmpeg';

export function extractSegment(inputPath, outputPath, startTime, duration) {
  return new Promise((resolve, reject) => {
    ffmpeg(inputPath)
      .seekInput(startTime)
      .duration(duration)
      .outputOptions(['-c:v libx264', '-c:a aac', '-avoid_negative_ts make_zero'])
      .output(outputPath)
      .on('end', resolve)
      .on('error', reject)
      .run();
  });
}
Enter fullscreen mode Exit fullscreen mode

Using .seekInput() before the input (input seeking) is much faster than output seeking because FFmpeg skips the packet decode entirely until it hits the target timestamp.

Step 3: Crop to 9:16

Here's where it gets interesting. For a 1920x1080 source, a 9:16 crop at full height would be 607x1080. But for Shorts, we want 1080x1920 — so we need to scale up.

// cropper.js
import ffmpeg from 'fluent-ffmpeg';

export function cropToVertical(inputPath, outputPath, cropX = null) {
  // For 1080p source: crop 607px wide, centered or at cropX
  const sourceWidth = 1920;
  const sourceHeight = 1080;
  const cropWidth = Math.floor(sourceHeight * (9 / 16)); // 607
  const x = cropX !== null ? cropX : Math.floor((sourceWidth - cropWidth) / 2);

  return new Promise((resolve, reject) => {
    ffmpeg(inputPath)
      .videoFilter([
        `crop=${cropWidth}:${sourceHeight}:${x}:0`,
        `scale=1080:1920:flags=lanczos`
      ])
      .outputOptions([
        '-c:v libx264',
        '-preset fast',
        '-crf 23',
        '-c:a aac',
        '-b:a 128k',
        '-movflags +faststart'
      ])
      .output(outputPath)
      .on('end', resolve)
      .on('error', reject)
      .run();
  });
}
Enter fullscreen mode Exit fullscreen mode

The -movflags +faststart flag moves the moov atom to the front of the file, which is essential for streaming and preview loading in browser players.

Step 4: Wire It Together

// pipeline.js
import { downloadVideo } from './downloader.js';
import { extractSegment } from './clipper.js';
import { cropToVertical } from './cropper.js';
import path from 'path';
import fs from 'fs';

const TMP = '/tmp/clips';
fs.mkdirSync(TMP, { recursive: true });

async function processYouTubeToShort(youtubeUrl, startTime, duration, cropX = null) {
  console.log('Downloading...');
  const sourcePath = await downloadVideo(youtubeUrl, TMP);

  console.log('Extracting segment...');
  const segmentPath = path.join(TMP, `segment_${Date.now()}.mp4`);
  await extractSegment(sourcePath, segmentPath, startTime, duration);

  console.log('Cropping to vertical...');
  const outputPath = path.join(TMP, `short_${Date.now()}.mp4`);
  await cropToVertical(segmentPath, outputPath, cropX);

  // Cleanup segment
  fs.unlinkSync(segmentPath);

  console.log(`Done: ${outputPath}`);
  return outputPath;
}

// Example usage
processYouTubeToShort(
  'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
  '00:01:24',
  45,
  700 // crop starting at x=700
);
Enter fullscreen mode Exit fullscreen mode

Adding Smart Crop Detection

For a basic center-crop fallback, what we have is fine. For intelligent crop detection — like following a speaker's face — you need a secondary analysis pass. That's where integrating something like MediaPipe or a frame-by-frame face detection step comes in.

The general pattern is: run face detection on a keyframe every N seconds, collect the bounding box centroids, then compute the median X position across the clip. This gives you a stable crop X that doesn't jitter.

async function getStableCropX(videoPath, fps = 1) {
  // Extract keyframes at 1fps to /tmp/frames/
  // Run face detection on each frame
  // Return median face center X, scaled to source resolution
  // (implementation depends on your detection model)
}
Enter fullscreen mode Exit fullscreen mode

Tools like ClipSpeedAI handle this detection and crop targeting automatically, which is the production-grade version of what we've built here.

Performance Notes

  • Input seeking (seekInput before input) is 3-10x faster than output seeking for long videos
  • preset fast vs preset slow: about 2x speed difference with minimal quality delta at CRF 23
  • For batch jobs, use a queue (Bull + Redis) rather than running these concurrently — FFmpeg is CPU-bound and concurrent jobs will thrash each other

What's Next

This pipeline is the foundation. From here you can layer in:

  • GPT-4o-based clip scoring to find the best segments automatically
  • Whisper-based caption burning for caption overlays
  • A job queue for processing dozens of videos in parallel

If you want to skip building all of this yourself, ClipSpeedAI wraps the entire pipeline into a hosted API — worth checking out if you're building on top of YouTube content at scale.

The full code above is production-ready for single-file processing. Wire it into a Bull queue and you've got a scalable YouTube Shorts factory.

Top comments (0)