When people ask what stack powers ClipSpeedAI, the answer is less interesting than the reasoning behind each choice. This post goes through every layer of the stack — frontend, API, processing pipeline, AI integrations, storage, and infrastructure — with the rationale for each technology decision.
The Problem Domain
ClipSpeedAI takes YouTube videos and automatically produces short-form clips (YouTube Shorts, TikTok, Reels). Each job involves:
- Downloading a multi-hundred-MB video file
- Transcribing audio (30 min video = ~5MB audio file)
- AI scoring of candidate clips
- Face detection and crop calculation
- Re-encoding to vertical format
- Generating and burning captions
This is not a simple CRUD app. The system is compute-bound, latency-tolerant (users expect async processing), and requires multiple AI/ML subsystems to cooperate.
Frontend: Next.js
Next.js for the web client. The reasons are straightforward:
- SSR for the landing/marketing pages (SEO matters)
- React for the interactive editor UI
- API routes for lightweight server-side operations
- Easy deployment to Vercel
The clip editor UI is React-heavy: a video player with timeline scrubbing, caption overlay preview, and crop position controls. The WebSocket connection for real-time progress updates lives here.
// hooks/useJobProgress.js
import { useEffect, useState } from 'react';
export function useJobProgress(jobId) {
const [progress, setProgress] = useState({ stage: 'queued', pct: 0 });
useEffect(() => {
if (!jobId) return;
const ws = new WebSocket(`${process.env.NEXT_PUBLIC_WS_URL}/jobs/${jobId}`);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
setProgress(data);
};
return () => ws.close();
}, [jobId]);
return progress;
}
Backend: Node.js + Express
The API layer is Node.js with Express. This was an easy call — the team was already on JavaScript, and for an I/O-heavy API (mostly coordinating between Redis, storage, and external APIs), Node's event loop is well-suited.
Route structure:
POST /api/jobs → submit new video job
GET /api/jobs/:id → get job status + results
GET /api/jobs/:id/clips → get processed clip URLs
DELETE /api/jobs/:id → cancel/delete job
GET /health → health check endpoint
WS /jobs/:id → WebSocket for progress updates
Queue Layer: BullMQ + Redis
Every processing job goes through a BullMQ queue backed by Redis. Four queues for the four pipeline stages:
import { Queue } from 'bullmq';
const REDIS = { host: process.env.REDIS_HOST, port: 6379 };
export const queues = {
download: new Queue('video:download', { connection: REDIS }),
transcribe: new Queue('video:transcribe', { connection: REDIS }),
score: new Queue('video:score', { connection: REDIS }),
encode: new Queue('video:encode', { connection: REDIS }),
};
Redis is also used for: API response caching, transcript result caching (avoid re-transcribing the same YouTube video), and session storage.
Video Processing: FFmpeg
FFmpeg is the backbone. Every video operation runs through it:
- Segment extraction
- Crop and scale to 9:16
- Caption burning (ASS subtitle files)
- Audio extraction for Whisper
- Thumbnail generation
FFmpeg is called via subprocess — fluent-ffmpeg for complex filter chains, execa for simpler invocations.
AI Layer
Three AI services in the pipeline:
OpenAI Whisper for transcription. The whisper-1 model via API. Fast, accurate, and the word-level timestamp mode is essential for caption chunk generation.
GPT-4o for clip scoring. Given a transcript window, GPT-4o evaluates virality potential across five dimensions and returns a composite score. Temperature 0.3, JSON response format, model gpt-4o (not mini — the quality difference is meaningful for creative scoring tasks).
MediaPipe (Python) for face detection. Runs as a Python child process from the Node.js worker. MediaPipe's face detection model, sampled at 1fps, with median smoothing and dead zone stabilization.
// ai/scoring.js - abbreviated interface
export async function scoreClipCandidates(transcriptSegments) {
const windows = buildScoringWindows(transcriptSegments, 90, 15);
const scores = await Promise.all(windows.map(scoreWindow));
return scores.sort((a, b) => b.composite_score - a.composite_score);
}
Storage: Cloudflare R2
All processed clips and intermediate files that need to persist go to Cloudflare R2 (S3-compatible, ~10x cheaper egress than S3). Temp processing files go to /tmp on the server.
import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
const r2 = new S3Client({
region: 'auto',
endpoint: process.env.R2_ENDPOINT
});
export async function getPresignedDownloadUrl(key, expiresIn = 3600) {
return getSignedUrl(r2, new GetObjectCommand({
Bucket: process.env.R2_BUCKET,
Key: key
}), { expiresIn });
}
Database: Supabase (PostgreSQL)
Supabase for the primary database. PostgreSQL under the hood, with Supabase's client library for auth, real-time subscriptions, and row-level security.
Job state (queued, processing, complete, failed) lives in a jobs table. Clip metadata (URLs, scores, timestamps) in a clips table. User subscriptions and usage tracking in their respective tables.
Infrastructure: Railway
Everything runs on Railway — the Node.js API, the worker processes, and Redis. Single Dockerfile that installs both Node.js and Python (for MediaPipe).
# railway.toml
[build]
builder = "DOCKERFILE"
[deploy]
healthcheckPath = "/health"
restartPolicyType = "ON_FAILURE"
Payments: Stripe
Stripe for subscription billing. Three tiers: free (3 clips/month), pro ($29/month, 100 clips), and agency ($99/month, unlimited). Stripe webhooks update the subscriptions table in Supabase when plans change.
The Stack Summary
| Layer | Technology | Why |
|---|---|---|
| Frontend | Next.js | SSR + React |
| API | Node.js + Express | I/O-bound workloads |
| Queue | BullMQ + Redis | Async job processing |
| Video | FFmpeg | Industry standard |
| Transcription | OpenAI Whisper | Accuracy + timestamps |
| Scoring | GPT-4o | Best creative reasoning |
| Face detection | MediaPipe (Python) | Fast CPU inference |
| Storage | Cloudflare R2 | Cheap egress |
| Database | Supabase/PostgreSQL | Managed, with auth |
| Infrastructure | Railway | Simple PaaS |
| Payments | Stripe | Standard |
The full product — ClipSpeedAI — runs on this stack today. No Kubernetes, no microservices mesh, no infrastructure team. Just a well-organized monolith with a sensible queue architecture.
The lesson from building this: boring technology choices made correctly outperform exciting technology choices made prematurely. The hosted version of this stack is available at ClipSpeedAI for teams who want the output — AI-scored, captioned, vertical clips — without assembling each layer themselves.
Top comments (0)