I built an AI faceless video generator in 2 months — here's the stack

#showdev #saas #ai #nextjs

Six months ago I started Keyvello (keyvello.com) — an AI video generator that turns a prompt into a complete short-form video in 2–5 minutes. Here's the technical breakdown for fellow builders.

The problem

Faceless creators on TikTok / YouTube Shorts / Reels spend 2–4 hours per video on scripting, voiceovers, B-roll, captions, and editing. Most burn out before they post 10 videos.

The stack

Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS 4, Radix UI
Backend: Next.js API Routes (App Router)
DB: Supabase (Postgres + Auth + RLS)
AI: GPT-5.5 for scripts, Fal.ai for images, ElevenLabs for voices
Video: FFmpeg via fluent-ffmpeg, Sharp for image processing
Storage: Cloudflare R2 (S3-compatible)
Payments: Dodo Payments
Compute: Vercel for the app, Modal for the video pipelines
State: Zustand

The pipeline

prompt → GPT-4o script → scene splitter → parallel(Flux images + ElevenLabs audio) → FFmpeg composition (Modal) → R2 upload → status update

What surprised me

Modal beats running FFmpeg in Vercel. Cold starts on Vercel functions made 60s+ videos impossible. Modal webhooks solved it.
RLS is non-negotiable from day one. Retro-fitting row-level security at 1K users is painful.
Credit refunds need their own RPC. I hit a silent failure with increment_user_credits getting blocked by a trigger. Use add_credits instead.
Users want templates, not raw control. I shipped a "blank canvas" mode early. Nobody used it. The 11 named templates (AI Stories, Fake Texts, Stick Animation, etc.) do 95% of generations.