DEV Community

Pritesh
Pritesh

Posted on

I built an AI faceless video generator in 2 months — here's the stack

Six months ago I started Keyvello (keyvello.com) — an AI video generator that turns a prompt into a complete short-form video in 2–5 minutes. Here's the technical breakdown for fellow builders.

The problem

Faceless creators on TikTok / YouTube Shorts / Reels spend 2–4 hours per video on scripting, voiceovers, B-roll, captions, and editing. Most burn out before they post 10 videos.

The stack

  • Frontend: Next.js 16, React 19, TypeScript, Tailwind CSS 4, Radix UI
  • Backend: Next.js API Routes (App Router)
  • DB: Supabase (Postgres + Auth + RLS)
  • AI: GPT-5.5 for scripts, Fal.ai for images, ElevenLabs for voices
  • Video: FFmpeg via fluent-ffmpeg, Sharp for image processing
  • Storage: Cloudflare R2 (S3-compatible)
  • Payments: Dodo Payments
  • Compute: Vercel for the app, Modal for the video pipelines
  • State: Zustand

The pipeline

prompt → GPT-4o script → scene splitter → parallel(Flux images + ElevenLabs audio) → FFmpeg composition (Modal) → R2 upload → status update

What surprised me

  1. Modal beats running FFmpeg in Vercel. Cold starts on Vercel functions made 60s+ videos impossible. Modal webhooks solved it.
  2. RLS is non-negotiable from day one. Retro-fitting row-level security at 1K users is painful.
  3. Credit refunds need their own RPC. I hit a silent failure with increment_user_credits getting blocked by a trigger. Use add_credits instead.
  4. Users want templates, not raw control. I shipped a "blank canvas" mode early. Nobody used it. The 11 named templates (AI Stories, Fake Texts, Stick Animation, etc.) do 95% of generations.

What's next

Better lipsync for the talking-avatar templates. Tighter cost controls per template tier. Affiliate program.

If you're building something in AI video, would love to compare notes — drop a comment.

Top comments (0)