Prompt to Video App
Building AI video generation app
Tags: Remotion, Honojs, Nextjs, API
Hey there!!
Welcome to the new blog
In this blog, I'll be telling the step-by-step process to generate a video using AI LLM and the remotion npm module programmatically from just plain english
This is a good SaaS feature I was building for inkgest.com
Remotion is an npm module SDK that helps to create animated videos using a code language, mainly DOM elements such as Text, Image, Objects, and Videos
Last month, Remotion opened its SDK for Claude code to let Claude build the AI-generated video by simply converting plain english into Remotion-based code that generates output in mp4 to finally create the video
I like the concept because this simple technique can literally replace Adobe Premiere Pro, not on a large scale, but for small purposes, it will work better than Premiere PRO.
I'll be using OpenAI API or OpenRouter API, you can use anyother API key and the reason you will get in the end.
npx create-video@latest
One command and your nextjs code repository with the remotion package is ready; it will generate a sample code as well.
But I'll be using a very simple Nextjs starter repository of my own, buildsaas.dev and follow the following steps to get started
- Install the Buildsaas.dev starter repository
- Add the remotion package, install dependencies using npm install
- Add OpenRouter API key for AI LLM
Next part is again an easy one to quickly add a server-side API for AI LLM to generate the remotion-based code, which we use the remotion SDK on the client side to render the video in mp4 format.
One server-side API, code given below, I'll explain later on.
/**
* Video generation — no LLM, no Firecrawl re-scraping.
*
* Pipeline:
* 1. Receive images[] + content + title from the frontend (already stored on the draft)
* 2. Generate narration audio via msedge-tts (reads first ~400 chars of draft content)
* 3. Render Remotion slideshow: one image per slide, Ken Burns effect, audio track
* 4. Upload mp4 to UploadThing
* 5. Save metadata to Firestore + backlink draft
*/
import { bundle } from "@remotion/bundler";
import { renderMedia, selectComposition } from "@remotion/renderer";
import { db } from "../../../lib/config/firebase";
import { collection, addDoc, doc, updateDoc, serverTimestamp } from "firebase/firestore";
import { UTApi } from "uploadthing/server";
import { MsEdgeTTS, OUTPUT_FORMAT } from "msedge-tts";
import path from "path";
import os from "os";
import fs from "fs/promises";
const utapi = new UTApi({ token: process.env.UPLOADTHING_SECRET });
/* ── Upload any buffer to UploadThing ── */
async function uploadFile({ buffer, filename, contentType }) {
const file = new File([buffer], filename, { type: contentType });
const response = await utapi.uploadFiles(file);
if (response.error) throw new Error(response.error.message || "UploadThing upload failed");
return response.data.url;
}
/* ── TTS: read draft content aloud via Edge TTS (free, no key) ── */
async function generateAudio({ title, content }) {
// Narrate title + first meaningful paragraph of content (strip markdown)
const plainText = content
.replace(/!\[.*?\]\(.*?\)/g, "") // strip images
.replace(/\[([^\]]+)\]\([^)]+\)/g, "$1") // flatten links
.replace(/#{1,6}\s/g, "") // strip headings
.replace(/[*_`>]/g, "") // strip formatting
.replace(/\s+/g, " ")
.trim();
const narration = `${title}. ${plainText}`.slice(0, 500);
const tts = new MsEdgeTTS();
await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3);
const tempFile = path.join(os.tmpdir(), `tts-${Date.now()}.mp3`);
await tts.toFile(tempFile, narration);
const buffer = await fs.readFile(tempFile);
await fs.unlink(tempFile).catch(() => {});
return buffer;
}
/* ── Remotion: render slideshow mp4 ── */
async function renderSlideshow({ images, title, audioUrl }) {
const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
await fs.mkdir(tempDir, { recursive: true });
try {
// Each image gets equal screen time; minimum 3s per slide, cap at 8s
const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
const totalDuration = images.length * perSlide;
const durationInFrames = Math.ceil(totalDuration * 30);
const bundleLocation = await bundle({
entryPoint: path.join(process.cwd(), "remotion/index.ts"),
webpackOverride: (config) => config,
});
const inputProps = {
images,
title,
audioUrl: audioUrl || "",
perSlide,
};
const composition = await selectComposition({
serveUrl: bundleLocation,
id: "VideoComposition",
inputProps,
});
const outputPath = path.join(tempDir, "output.mp4");
await renderMedia({
composition,
serveUrl: bundleLocation,
codec: "h264",
outputLocation: outputPath,
inputProps,
concurrency: 4,
frameRange: [0, durationInFrames - 1],
});
const videoBuffer = await fs.readFile(outputPath);
return { videoBuffer, tempDir };
} catch (err) {
await fs.rm(tempDir, { recursive: true, force: true });
throw err;
}
}
/* ── Firestore: save video doc + backlink draft ── */
async function saveVideoDoc({ videoUrl, audioUrl, title, userId, draftId }) {
const docRef = await addDoc(collection(db, "videos"), {
videoUrl,
audioUrl: audioUrl || "",
title,
userId: userId || "anonymous",
draftId: draftId || null,
createdAt: serverTimestamp(),
status: "completed",
});
if (draftId) {
try {
await updateDoc(doc(db, "drafts", draftId), {
videoUrl,
videoDocId: docRef.id,
});
} catch (e) {
console.warn("[video] Failed to backlink draft:", e?.message);
}
}
return docRef.id;
}
/* ── Handler ── */
export default async function handler(req, res) {
if (req.method !== "POST") {
return res.status(405).json({ error: "Method not allowed" });
}
try {
const { images, title, content, userId, draftId } = req.body || {};
if (!userId || typeof userId !== "string" || !userId.trim()) {
return res.status(401).json({ error: "Authentication required." });
}
if (!content || !content.trim()) {
return res.status(400).json({ error: "Content is required" });
}
if (!Array.isArray(images) || images.length === 0) {
return res.status(400).json({ error: "At least one image is required to generate a slideshow video" });
}
// Valid HTTPS image URLs only
const validImages = images
.filter((u) => typeof u === "string" && /^https?:\/\//i.test(u))
.slice(0, 15);
if (validImages.length === 0) {
return res.status(400).json({ error: "No valid image URLs found" });
}
// Step 1 — Generate narration audio from draft content (Edge TTS, free)
let audioUrl = null;
console.log("[video] Step 1: generating narration from draft content");
try {
const audioBuffer = await generateAudio({ title: title || "Draft", content });
const audioFilename = `${Date.now()}-audio.mp3`;
audioUrl = await uploadFile({ buffer: audioBuffer, filename: audioFilename, contentType: "audio/mpeg" });
console.log("[video] Audio ready:", audioUrl);
} catch (e) {
console.warn("[video] Audio skipped:", e?.message);
}
// Step 2 — Render slideshow with Remotion
console.log(`[video] Step 2: rendering slideshow (${validImages.length} images)`);
const { videoBuffer, tempDir } = await renderSlideshow({
images: validImages,
title: title || "Draft",
audioUrl,
});
// Step 3 — Upload mp4
console.log("[video] Step 3: uploading video");
const videoFilename = `${Date.now()}-${(title || "draft").replace(/[^a-z0-9]/gi, "-").slice(0, 40)}.mp4`;
const videoUrl = await uploadFile({ buffer: videoBuffer, filename: videoFilename, contentType: "video/mp4" });
// Step 4 — Save metadata + backlink draft
console.log("[video] Step 4: saving metadata");
const docId = await saveVideoDoc({ videoUrl, audioUrl, title: title || "Draft", userId, draftId });
await fs.rm(tempDir, { recursive: true, force: true });
return res.status(200).json({
success: true,
videoUrl,
audioUrl,
docId,
title: title || "Draft",
slideCount: validImages.length,
});
} catch (error) {
console.error("[video] error:", error);
return res.status(500).json({ error: error?.message || "Failed to generate video" });
}
}
The bundle will bundle the video, assemble them
renderMedia and selectComposition will do the same as their name explains, render the media like video and images and select how to compose the video.
Firebase is used to store the final mp4 in the storage and store the user credentials and mp4 link in the firestote
Uploadthing is an npm SDK for storing files on a server, mainly assets, similar to firestorage or AWS Storage
The rest are simple packages, such as the fs module to read files and the os module to read the operating system.
Let's break and explain
/* ── Remotion: render slideshow mp4 ── */
async function renderSlideshow({ images, title, audioUrl }) {
const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
await fs.mkdir(tempDir, { recursive: true });
try {
// Each image gets equal screen time; minimum 3s per slide, cap at 8s
const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
const totalDuration = images.length * perSlide;
const durationInFrames = Math.ceil(totalDuration * 30);
const bundleLocation = await bundle({
entryPoint: path.join(process.cwd(), "remotion/index.ts"),
webpackOverride: (config) => config,
});
const inputProps = {
images,
title,
audioUrl: audioUrl || "",
perSlide,
};
const composition = await selectComposition({
serveUrl: bundleLocation,
id: "VideoComposition",
inputProps,
});
const outputPath = path.join(tempDir, "output.mp4");
await renderMedia({
composition,
serveUrl: bundleLocation,
codec: "h264",
outputLocation: outputPath,
inputProps,
concurrency: 4,
frameRange: [0, durationInFrames - 1],
});
const videoBuffer = await fs.readFile(outputPath);
return { videoBuffer, tempDir };
} catch (err) {
await fs.rm(tempDir, { recursive: true, force: true });
throw err;
}
}
First, we are creating a video and writing it into the nextjs repository as the output in mp4 format.
Above method is a Remotion video rendering function — it takes a list of images, a title, and an optional audio URL, and renders them into an .mp4 slideshow video programmatically (server-side, no browser needed).
The 5 steps it runs through
- Creates a temp folder
const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
Unique scratch space like /tmp/video-1741234567 to write files during rendering. Cleaned up on error.
- Calculates slide timing
const perSlide = Math.min(8, Math.max(3, Math.floor(40 / images.length)));
Targets a ~40 second total video. Each image gets equal time, clamped between 3s minimum and 8s maximum per slide. Then converts to frames at 30fps.
- Webpack-bundles your Remotion composition
await bundle({ entryPoint: "remotion/index.ts" })
Your VideoComposition is a React component. Remotion bundles it with webpack so headless Chromium can execute it frame by frame.
- Renders every frame with headless Chromium
await renderMedia({ codec: "h264", concurrency: 4 })
Spins up 4 parallel Chromium instances, screenshots every frame of the React component, then ffmpeg stitches them into an h264 .mp4.
- Returns the video as a Bufferjs
const videoBuffer = await fs.readFile(outputPath);
return { videoBuffer, tempDir };
Caller gets the raw bytes to upload to S3/Firebase Storage. They're also responsible for deleting tempDir afterwards — that's intentional.
The last part is given below.
/* ── Handler ── */
export default async function handler(req, res) {
if (req.method !== "POST") {
return res.status(405).json({ error: "Method not allowed" });
}
try {
const { images, title, content, userId, draftId } = req.body || {};
if (!userId || typeof userId !== "string" || !userId.trim()) {
return res.status(401).json({ error: "Authentication required." });
}
if (!content || !content.trim()) {
return res.status(400).json({ error: "Content is required" });
}
if (!Array.isArray(images) || images.length === 0) {
return res.status(400).json({ error: "At least one image is required to generate a slideshow video" });
}
// Valid HTTPS image URLs only
const validImages = images
.filter((u) => typeof u === "string" && /^https?:\/\//i.test(u))
.slice(0, 15);
if (validImages.length === 0) {
return res.status(400).json({ error: "No valid image URLs found" });
}
// Step 1 — Generate narration audio from draft content (Edge TTS, free)
let audioUrl = null;
console.log("[video] Step 1: generating narration from draft content");
try {
const audioBuffer = await generateAudio({ title: title || "Draft", content });
const audioFilename = `${Date.now()}-audio.mp3`;
audioUrl = await uploadFile({ buffer: audioBuffer, filename: audioFilename, contentType: "audio/mpeg" });
console.log("[video] Audio ready:", audioUrl);
} catch (e) {
console.warn("[video] Audio skipped:", e?.message);
}
// Step 2 — Render slideshow with Remotion
console.log(`[video] Step 2: rendering slideshow (${validImages.length} images)`);
const { videoBuffer, tempDir } = await renderSlideshow({
images: validImages,
title: title || "Draft",
audioUrl,
});
// Step 3 — Upload mp4
console.log("[video] Step 3: uploading video");
const videoFilename = `${Date.now()}-${(title || "draft").replace(/[^a-z0-9]/gi, "-").slice(0, 40)}.mp4`;
const videoUrl = await uploadFile({ buffer: videoBuffer, filename: videoFilename, contentType: "video/mp4" });
// Step 4 — Save metadata + backlink draft
console.log("[video] Step 4: saving metadata");
const docId = await saveVideoDoc({ videoUrl, audioUrl, title: title || "Draft", userId, draftId });
await fs.rm(tempDir, { recursive: true, force: true });
return res.status(200).json({
success: true,
videoUrl,
audioUrl,
docId,
title: title || "Draft",
slideCount: validImages.length,
});
} catch (error) {
console.error("[video] error:", error);
return res.status(500).json({ error: error?.message || "Failed to generate video" });
}
}
This is the orchestrator; it validates the request, creates a video, stores the video in storage and firestore and returns the final response object.
The client side now just needs to make an API call to the above endpoint or function to get the video URL, which we can render inside an iframe or a video HTML element.
One question: why are we using a server-side API endpoint?
Because the client-side doesn't have fs module access, the browser can't read the code repository files; the browser needs an API to read and access the file system. NextJS server-side API runs on the server instead of the browser or client-side, hence giving us access to the fs module; the rest of the modules from the above method can be used in the client-side
If you are confused between client-side and server-side, then read more on chatgpt.
Creating a video using AI
We will follow the 6-step pipeline for creating a video
POST /api/generate-video
{ prompt, userId, draftId?, style?, voiceSpeed? }
│
├─ Step 1: OpenRouter (claude-3.5-sonnet)
│ prompt → { title, narration, slides[] }
│
├─ Step 2: OpenRouter (flux-schnell)
│ slides[].imagePrompt → image URLs (parallel, fallback to placeholder)
│
├─ Step 3: Edge TTS (free, no key needed)
│ narration → MP3 → Firebase Storage → audioUrl
│ (non-fatal — video renders muted if this fails)
│
├─ Step 4: Remotion
│ images + audioUrl + captions → MP4 buffer
│
├─ Step 5: Firebase Storage
│ MP4 buffer → videoUrl
│
└─ Step 6: Firestore batch write
videos/{id} + drafts/{draftId} backlink
Below is the code or server-side API endpoint, generate-video
// pages/api/generate-video.js
// ─────────────────────────────────────────────────────────────────────────────
// POST /api/generate-video
//
// Body:
// prompt string required — user's natural language request
// userId string required — authenticated user ID
// draftId string optional — Firestore draft to backlink
// style string optional — "cinematic" | "minimal" | "documentary" (default: "cinematic")
// voiceSpeed number optional — TTS speed 0.5–2.0 (default: 1.0)
//
// Flow:
// 1. Validate request
// 2. OpenRouter → generate structured video plan (title + narration + image prompts)
// 3. Generate images from prompts (via OpenRouter vision-capable model or placeholder)
// 4. Edge TTS → narration audio MP3
// 5. Remotion → render MP4 slideshow
// 6. Upload audio + video to Firebase Storage
// 7. Save Firestore doc + backlink draft
// 8. Return { videoUrl, audioUrl, docId, title, slideCount, plan }
// ─────────────────────────────────────────────────────────────────────────────
import fs from "fs/promises";
import os from "os";
import path from "path";
import { initializeApp, getApps, cert } from "firebase-admin/app";
import { getFirestore } from "firebase-admin/firestore";
import { getStorage } from "firebase-admin/storage";
import { bundle } from "@remotion/bundler";
import { renderMedia, selectComposition } from "@remotion/renderer";
// ─── Firebase Admin Init ─────────────────────────────────────────────────────
if (!getApps().length) {
initializeApp({
credential: cert({
projectId: process.env.FIREBASE_PROJECT_ID,
clientEmail: process.env.FIREBASE_CLIENT_EMAIL,
privateKey: process.env.FIREBASE_PRIVATE_KEY?.replace(/\\n/g, "\n"),
}),
storageBucket: process.env.FIREBASE_STORAGE_BUCKET,
});
}
const db = getFirestore();
const bucket = getStorage().bucket();
// ─── Constants ───────────────────────────────────────────────────────────────
const OPENROUTER_BASE = "https://openrouter.ai/api/v1";
const PLAN_MODEL = "anthropic/claude-3.5-sonnet"; // Best for structured JSON output
const IMAGE_MODEL = "black-forest-labs/flux-schnell"; // Fast image generation via OpenRouter
const MAX_SLIDES = 10;
const MIN_SLIDES = 3;
// ─── 1. OpenRouter: Generate Video Plan ──────────────────────────────────────
// Returns: { title, narration, slides: [{ imagePrompt, caption }] }
async function generateVideoPlan({ prompt, style = "cinematic" }) {
const styleGuides = {
cinematic: "dramatic lighting, wide shots, film grain, professional photography",
minimal: "clean white backgrounds, simple compositions, flat design, minimalist",
documentary: "realistic, candid, natural lighting, photojournalism style",
};
const styleHint = styleGuides[style] || styleGuides.cinematic;
const systemPrompt = `You are a video producer AI. Given a user's prompt, generate a structured video plan.
Return ONLY valid JSON — no markdown, no explanation, no backticks.
The JSON must match this exact shape:
{
"title": "string (max 60 chars)",
"narration": "string (150-300 words, will be converted to voiceover audio)",
"slides": [
{
"imagePrompt": "string (detailed image generation prompt, ${styleHint})",
"caption": "string (max 8 words, shown on screen)"
}
]
}
Rules:
- Generate between ${MIN_SLIDES} and ${MAX_SLIDES} slides
- Each imagePrompt must be highly detailed and visual (40-80 words)
- Narration should flow naturally when read aloud
- Captions should be punchy and complement the image
- Style: ${style}`;
const response = await fetch(`${OPENROUTER_BASE}/chat/completions`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
"HTTP-Referer": process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000",
"X-Title": "BuildSaaS Video Generator",
},
body: JSON.stringify({
model: PLAN_MODEL,
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: prompt },
],
temperature: 0.7,
max_tokens: 2000,
response_format: { type: "json_object" }, // Force JSON mode
}),
});
if (!response.ok) {
const err = await response.text();
throw new Error(`OpenRouter plan generation failed (${response.status}): ${err}`);
}
const data = await response.json();
const raw = data.choices?.[0]?.message?.content;
if (!raw) throw new Error("OpenRouter returned empty content for video plan");
let plan;
try {
plan = JSON.parse(raw);
} catch {
// Attempt to extract JSON if model wrapped it
const match = raw.match(/\{[\s\S]*\}/);
if (!match) throw new Error("Could not parse video plan JSON from OpenRouter response");
plan = JSON.parse(match[0]);
}
// Validate shape
if (!plan.title || !plan.narration || !Array.isArray(plan.slides) || plan.slides.length === 0) {
throw new Error("Video plan JSON is missing required fields (title, narration, slides)");
}
// Cap slides
plan.slides = plan.slides.slice(0, MAX_SLIDES);
return plan;
}
// ─── 2. OpenRouter: Generate Images from Prompts ─────────────────────────────
// Uses FLUX via OpenRouter's image generation endpoint.
// Falls back to a placeholder image URL if generation fails for any slide.
async function generateImages({ slides, style = "cinematic" }) {
const styleGuides = {
cinematic: "cinematic, dramatic lighting, 8k, professional photography, film still",
minimal: "minimalist, clean, flat design, white background, simple composition",
documentary: "documentary, realistic, natural lighting, photojournalism, candid",
};
const styleSuffix = styleGuides[style] || styleGuides.cinematic;
const imageUrls = await Promise.allSettled(
slides.map(async (slide, idx) => {
try {
const fullPrompt = `${slide.imagePrompt}, ${styleSuffix}`;
const response = await fetch(`${OPENROUTER_BASE}/images/generations`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.OPENROUTER_API_KEY}`,
"Content-Type": "application/json",
"HTTP-Referer": process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000",
"X-Title": "BuildSaaS Video Generator",
},
body: JSON.stringify({
model: IMAGE_MODEL,
prompt: fullPrompt,
n: 1,
size: "1280x720", // 16:9 for video
}),
});
if (!response.ok) {
throw new Error(`Image generation failed for slide ${idx + 1}: ${response.status}`);
}
const data = await response.json();
const url = data.data?.[0]?.url;
if (!url) throw new Error(`No image URL returned for slide ${idx + 1}`);
console.log(`[video] Image ${idx + 1}/${slides.length} generated`);
return url;
} catch (err) {
// Graceful fallback: use a placeholder image so video still renders
console.warn(`[video] Image ${idx + 1} failed, using placeholder:`, err.message);
return `https://placehold.co/1280x720/1a1a2e/ffffff?text=${encodeURIComponent(slide.caption || `Slide ${idx + 1}`)}`;
}
})
);
// Extract values (allSettled means we always get something)
return imageUrls.map((result, idx) =>
result.status === "fulfilled"
? result.value
: `https://placehold.co/1280x720/1a1a2e/ffffff?text=Slide+${idx + 1}`
);
}
// ─── 3. Edge TTS: Generate Narration Audio ───────────────────────────────────
// Uses Microsoft Edge TTS (free, no API key needed) via the edge-tts npm package.
// Returns a Buffer of MP3 audio data.
async function generateAudio({ title, content, speed = 1.0 }) {
// Dynamic import — edge-tts is ESM
const { MsEdgeTTS, OUTPUT_FORMAT } = await import("edge-tts");
const tts = new MsEdgeTTS();
// en-US-AriaNeural is natural and works well for narration
await tts.setMetadata(
"en-US-AriaNeural",
OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3,
);
// Prepend title as a natural intro
const fullText = title ? `${title}. ${content}` : content;
// Adjust speed via SSML rate tag
const ssml = `<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-AriaNeural">
<prosody rate="${speed >= 1 ? `+${Math.round((speed - 1) * 100)}%` : `-${Math.round((1 - speed) * 100)}%`}">
${fullText.replace(/[<>&'"]/g, (c) => ({ "<":"<",">":">","&":"&","'":"'",'"':""" }[c]))}
</prosody>
</voice>
</speak>`;
const chunks = [];
const readable = await tts.toStream(ssml);
return new Promise((resolve, reject) => {
readable.on("data", (chunk) => chunks.push(chunk));
readable.on("end", () => resolve(Buffer.concat(chunks)));
readable.on("error", reject);
});
}
// ─── 4. Firebase Storage: Upload File ────────────────────────────────────────
// Uploads a Buffer and returns a public HTTPS URL.
async function uploadFile({ buffer, filename, contentType }) {
const destination = `videos/${Date.now()}-${filename}`;
const file = bucket.file(destination);
await file.save(buffer, {
metadata: { contentType },
resumable: false,
});
// Make file publicly readable
await file.makePublic();
return `https://storage.googleapis.com/${bucket.name}/${destination}`;
}
// ─── 5. Remotion: Render Slideshow MP4 ───────────────────────────────────────
async function renderSlideshow({ images, title, audioUrl, captions = [] }) {
const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
await fs.mkdir(tempDir, { recursive: true });
try {
const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
const totalDuration = images.length * perSlide;
const durationInFrames = Math.ceil(totalDuration * 30);
const bundleLocation = await bundle({
entryPoint: path.join(process.cwd(), "remotion/index.ts"),
webpackOverride: (config) => config,
});
const inputProps = {
images,
title,
audioUrl: audioUrl || "",
captions, // Pass captions through to composition
perSlide,
};
const composition = await selectComposition({
serveUrl: bundleLocation,
id: "VideoComposition",
inputProps,
});
const outputPath = path.join(tempDir, "output.mp4");
await renderMedia({
composition,
serveUrl: bundleLocation,
codec: "h264",
outputLocation: outputPath,
inputProps,
concurrency: 4,
frameRange: [0, durationInFrames - 1],
});
const videoBuffer = await fs.readFile(outputPath);
return { videoBuffer, tempDir };
} catch (err) {
await fs.rm(tempDir, { recursive: true, force: true });
throw err;
}
}
// ─── 6. Firestore: Save Video Doc + Backlink Draft ───────────────────────────
async function saveVideoDoc({ videoUrl, audioUrl, title, userId, draftId, plan, slideCount, style }) {
const now = new Date().toISOString();
const batch = db.batch();
// Video document
const videoRef = db.collection("videos").doc();
batch.set(videoRef, {
id: videoRef.id,
title,
videoUrl,
audioUrl: audioUrl || null,
userId,
draftId: draftId || null,
slideCount,
style,
plan, // Store the full plan for reference/regeneration
status: "ready",
createdAt: now,
updatedAt: now,
});
// Backlink draft if provided
if (draftId) {
const draftRef = db.collection("drafts").doc(draftId);
batch.update(draftRef, {
videoId: videoRef.id,
videoUrl,
videoStatus: "rendered",
updatedAt: now,
});
}
await batch.commit();
return videoRef.id;
}
// ─── Main Handler ─────────────────────────────────────────────────────────────
export default async function handler(req, res) {
if (req.method !== "POST") {
return res.status(405).json({ error: "Method not allowed" });
}
// ── Validate environment ──────────────────────────────────────────────────
if (!process.env.OPENROUTER_API_KEY) {
console.error("[video] OPENROUTER_API_KEY is not set");
return res.status(500).json({ error: "Server configuration error: missing OpenRouter API key" });
}
let tempDir = null;
try {
const {
prompt,
userId,
draftId,
style = "cinematic",
voiceSpeed = 1.0,
} = req.body || {};
// ── Input validation ────────────────────────────────────────────────────
if (!userId || typeof userId !== "string" || !userId.trim()) {
return res.status(401).json({ error: "Authentication required." });
}
if (!prompt || typeof prompt !== "string" || !prompt.trim()) {
return res.status(400).json({ error: "A prompt is required to generate a video." });
}
if (prompt.trim().length < 10) {
return res.status(400).json({ error: "Prompt is too short. Provide at least 10 characters." });
}
if (prompt.trim().length > 2000) {
return res.status(400).json({ error: "Prompt is too long. Maximum 2000 characters." });
}
const validStyles = ["cinematic", "minimal", "documentary"];
const safeStyle = validStyles.includes(style) ? style : "cinematic";
const safeSpeed = Math.min(2.0, Math.max(0.5, Number(voiceSpeed) || 1.0));
console.log(`[video] Starting pipeline for user ${userId}`);
console.log(`[video] Prompt: "${prompt.slice(0, 80)}…"`);
console.log(`[video] Style: ${safeStyle} | Speed: ${safeSpeed}`);
// ── Step 1: Generate video plan via OpenRouter ──────────────────────────
console.log("[video] Step 1: generating video plan via OpenRouter");
const plan = await generateVideoPlan({ prompt: prompt.trim(), style: safeStyle });
console.log(`[video] Plan ready: "${plan.title}" — ${plan.slides.length} slides`);
// ── Step 2: Generate images for each slide ──────────────────────────────
console.log(`[video] Step 2: generating ${plan.slides.length} images`);
const imageUrls = await generateImages({ slides: plan.slides, style: safeStyle });
console.log(`[video] ${imageUrls.length} images ready`);
// ── Step 3: Generate narration audio (non-fatal) ────────────────────────
let audioUrl = null;
console.log("[video] Step 3: generating narration audio via Edge TTS");
try {
const audioBuffer = await generateAudio({
title: plan.title,
content: plan.narration,
speed: safeSpeed,
});
const audioFilename = `${Date.now()}-audio.mp3`;
audioUrl = await uploadFile({
buffer: audioBuffer,
filename: audioFilename,
contentType: "audio/mpeg",
});
console.log("[video] Audio uploaded:", audioUrl);
} catch (e) {
console.warn("[video] Audio generation skipped (non-fatal):", e?.message);
}
// ── Step 4: Render slideshow with Remotion ──────────────────────────────
console.log(`[video] Step 4: rendering MP4 (${imageUrls.length} slides)`);
const captions = plan.slides.map((s) => s.caption || "");
const result = await renderSlideshow({
images: imageUrls,
title: plan.title,
audioUrl,
captions,
});
tempDir = result.tempDir;
const { videoBuffer } = result;
console.log("[video] Render complete");
// ── Step 5: Upload MP4 to Firebase Storage ──────────────────────────────
console.log("[video] Step 5: uploading MP4");
const safeTitle = plan.title.replace(/[^a-z0-9]/gi, "-").slice(0, 40).toLowerCase();
const videoFilename = `${Date.now()}-${safeTitle}.mp4`;
const videoUrl = await uploadFile({
buffer: videoBuffer,
filename: videoFilename,
contentType: "video/mp4",
});
console.log("[video] Video uploaded:", videoUrl);
// ── Step 6: Save Firestore doc + backlink draft ─────────────────────────
console.log("[video] Step 6: saving Firestore metadata");
const docId = await saveVideoDoc({
videoUrl,
audioUrl,
title: plan.title,
userId,
draftId,
plan,
slideCount: imageUrls.length,
style: safeStyle,
});
// ── Cleanup temp dir ────────────────────────────────────────────────────
if (tempDir) {
await fs.rm(tempDir, { recursive: true, force: true });
tempDir = null;
}
console.log(`[video] Pipeline complete. docId: ${docId}`);
return res.status(200).json({
success: true,
videoUrl,
audioUrl,
docId,
title: plan.title,
narration: plan.narration,
slideCount: imageUrls.length,
style: safeStyle,
plan, // Full plan — useful for frontend to show captions/structure
});
} catch (error) {
// Cleanup temp dir on any unhandled error
if (tempDir) {
await fs.rm(tempDir, { recursive: true, force: true }).catch(() => {});
}
console.error("[video] Pipeline error:", error);
// Surface specific error types
if (error.message?.includes("OpenRouter")) {
return res.status(502).json({ error: "AI service error: " + error.message });
}
if (error.message?.includes("Remotion") || error.message?.includes("render")) {
return res.status(500).json({ error: "Video rendering failed: " + error.message });
}
return res.status(500).json({ error: error?.message || "Failed to generate video" });
}
}
// ─── Route Config ─────────────────────────────────────────────────────────────
// Required: disable Next.js body size limit for large payloads
export const config = {
api: {
bodyParser: {
sizeLimit: "10mb",
},
responseLimit: false, // Video responses can be large
externalResolver: true, // Suppress missing response warnings
},
};
Rendering Video in Browser
The last part is to render the video once it is generated and stored in the browser.
import { useState, useRef } from "react";
import { motion, AnimatePresence } from "framer-motion";
const STEPS = [
{
id: "plan",
icon: "✦",
label: "Writing video script",
sub: "Claude generating title, narration & slide prompts",
},
{
id: "images",
icon: "◈",
label: "Generating slide images",
sub: "FLUX creating visuals for each scene",
},
{
id: "audio",
icon: "◉",
label: "Synthesising narration",
sub: "Edge TTS converting script to voiceover",
},
{
id: "render",
icon: "▶",
label: "Rendering MP4",
sub: "Remotion stitching frames into video",
},
{
id: "save",
icon: "◎",
label: "Saving to library",
sub: "Uploading to storage and saving metadata",
},
];
const STYLES = [
{
id: "cinematic",
label: "Cinematic",
desc: "Dramatic · Film-grade",
icon: "🎬",
},
{ id: "minimal", label: "Minimal", desc: "Clean · Flat design", icon: "◻" },
{
id: "documentary",
label: "Documentary",
desc: "Realistic · Candid",
icon: "📷",
},
];
function usePipelineSteps() {
const [activeStep, setActiveStep] = useState(-1);
const [doneSteps, setDoneSteps] = useState([]);
const timers = useRef([]);
const start = () => {
setActiveStep(0);
setDoneSteps([]);
const timings = [0, 4000, 14000, 19000, 38000];
timings.forEach((delay, idx) => {
const t = setTimeout(() => {
setActiveStep(idx);
if (idx > 0) setDoneSteps((d) => [...d, STEPS[idx - 1].id]);
}, delay);
timers.current.push(t);
});
};
const finish = () => {
timers.current.forEach(clearTimeout);
setDoneSteps(STEPS.map((s) => s.id));
setActiveStep(-1);
};
const reset = () => {
timers.current.forEach(clearTimeout);
setActiveStep(-1);
setDoneSteps([]);
};
return { activeStep, doneSteps, start, finish, reset };
}
function Spinner() {
return (
<svg
style={{ animation: "spin 1s linear infinite", width: 16, height: 16 }}
viewBox="0 0 24 24"
fill="none"
>
<style>{`@keyframes spin { to { transform: rotate(360deg); } } @keyframes pulse-dot { 0%,100%{opacity:1} 50%{opacity:0.2} }`}</style>
<circle
cx="12"
cy="12"
r="10"
stroke="currentColor"
strokeWidth="3"
strokeOpacity="0.25"
/>
<path fill="currentColor" d="M4 12a8 8 0 018-8v8H4z" />
</svg>
);
}
function PulsingDot() {
return (
<span
style={{
display: "block",
width: 8,
height: 8,
borderRadius: "50%",
background: "#10b981",
animation: "pulse-dot 1.1s ease-in-out infinite",
}}
/>
);
}
export default function VideoGenerator({ userId = "demo-user" }) {
const [prompt, setPrompt] = useState("");
const [style, setStyle] = useState("cinematic");
const [speed, setSpeed] = useState(1.0);
const [loading, setLoading] = useState(false);
const [result, setResult] = useState(null);
const [error, setError] = useState(null);
const pipeline = usePipelineSteps();
const canSubmit = prompt.trim().length >= 10 && !loading;
const handleGenerate = async () => {
if (!canSubmit) return;
setLoading(true);
setResult(null);
setError(null);
pipeline.start();
try {
const res = await fetch("/api/generate-video", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
prompt: prompt.trim(),
userId,
style,
voiceSpeed: speed,
}),
});
const data = await res.json();
if (!res.ok) throw new Error(data.error || "Generation failed");
pipeline.finish();
setResult(data);
} catch (err) {
pipeline.reset();
setError(err.message || "Something went wrong");
} finally {
setLoading(false);
}
};
return (
<div
style={{
minHeight: "100vh",
background: "#080a0d",
color: "#e8eaed",
position: "relative",
overflowX: "hidden",
}}
>
{/* Grid background */}
<div
style={{
position: "fixed",
inset: 0,
pointerEvents: "none",
zIndex: 0,
backgroundImage:
"linear-gradient(rgba(255,255,255,.022) 1px,transparent 1px),linear-gradient(90deg,rgba(255,255,255,.022) 1px,transparent 1px)",
backgroundSize: "48px 48px",
}}
/>
{/* Glow */}
<div
style={{
position: "fixed",
top: "-18%",
left: "50%",
transform: "translateX(-50%)",
width: 640,
height: 380,
borderRadius: "50%",
pointerEvents: "none",
zIndex: 0,
background:
"radial-gradient(ellipse,rgba(16,185,129,.16) 0%,transparent 70%)",
}}
/>
{/* Content */}
<div
style={{
position: "relative",
zIndex: 1,
maxWidth: 620,
margin: "0 auto",
padding: "60px 20px 80px",
}}
>
{/* ── Header ── */}
<motion.div
initial={{ opacity: 0, y: -14 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.45 }}
style={{ textAlign: "center", marginBottom: 40 }}
>
<div
style={{
display: "inline-flex",
alignItems: "center",
gap: 8,
fontSize: 10,
letterSpacing: "0.18em",
textTransform: "uppercase",
color: "rgba(16,185,129,0.8)",
border: "1px solid rgba(16,185,129,0.25)",
borderRadius: 999,
padding: "5px 16px",
background: "rgba(16,185,129,0.06)",
marginBottom: 20,
}}
>
<span
style={{
width: 6,
height: 6,
borderRadius: "50%",
background: "#10b981",
display: "inline-block",
}}
/>
AI · Remotion · Edge TTS
</div>
<h1
style={{
fontSize: 38,
fontWeight: 800,
letterSpacing: "-0.04em",
color: "#ffffff",
lineHeight: 1,
marginBottom: 12,
fontFamily: "'Syne','DM Sans',system-ui,sans-serif",
}}
>
Prompt → Video
</h1>
<p
style={{
fontSize: 13,
color: "rgba(255,255,255,0.38)",
lineHeight: 1.6,
}}
>
Describe anything. Get a narrated MP4 slideshow in ~60 seconds.
</p>
</motion.div>
{/* ── Input Card ── */}
<motion.div
initial={{ opacity: 0, y: 18 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.45, delay: 0.08 }}
style={{
borderRadius: 16,
border: "1px solid rgba(255,255,255,0.08)",
background: "rgba(255,255,255,0.03)",
backdropFilter: "blur(12px)",
overflow: "hidden",
marginBottom: 12,
}}
>
{/* Prompt */}
<div style={{ padding: "20px 20px 14px" }}>
<label
style={{
display: "block",
fontSize: 10,
letterSpacing: "0.14em",
textTransform: "uppercase",
color: "rgba(255,255,255,0.38)",
marginBottom: 10,
}}
>
Your Prompt
</label>
<textarea
value={prompt}
onChange={(e) => setPrompt(e.target.value)}
disabled={loading}
rows={4}
placeholder="e.g. A documentary about solo founders who quit their jobs to build software products alone and became successful..."
style={{
width: "100%",
background: "transparent",
border: "none",
outline: "none",
resize: "none",
fontSize: 13,
color: "rgba(255,255,255,0.82)",
lineHeight: 1.65,
fontFamily: "inherit",
opacity: loading ? 0.4 : 1,
}}
/>
<div
style={{
display: "flex",
justifyContent: "space-between",
marginTop: 6,
}}
>
<span
style={{
fontSize: 10,
color:
prompt.length > 1800 ? "#f87171" : "rgba(255,255,255,0.2)",
}}
>
{prompt.length} / 2000
</span>
{prompt.length > 0 && prompt.length < 10 && (
<span style={{ fontSize: 10, color: "rgba(251,191,36,0.7)" }}>
Need {10 - prompt.length} more chars
</span>
)}
</div>
</div>
<div
style={{
height: 1,
background: "rgba(255,255,255,0.06)",
margin: "0 20px",
}}
/>
{/* Style selector */}
<div style={{ padding: "16px 20px" }}>
<label
style={{
display: "block",
fontSize: 10,
letterSpacing: "0.14em",
textTransform: "uppercase",
color: "rgba(255,255,255,0.38)",
marginBottom: 12,
}}
>
Visual Style
</label>
<div
style={{
display: "grid",
gridTemplateColumns: "repeat(3,1fr)",
gap: 8,
}}
>
{STYLES.map((st) => (
<button
key={st.id}
onClick={() => !loading && setStyle(st.id)}
disabled={loading}
style={{
borderRadius: 12,
padding: "12px 10px",
textAlign: "left",
cursor: loading ? "not-allowed" : "pointer",
border:
style === st.id
? "1px solid rgba(16,185,129,0.5)"
: "1px solid rgba(255,255,255,0.08)",
background:
style === st.id
? "rgba(16,185,129,0.1)"
: "rgba(255,255,255,0.02)",
color:
style === st.id ? "#ffffff" : "rgba(255,255,255,0.4)",
transition: "all 0.18s",
opacity: loading ? 0.5 : 1,
}}
>
<div style={{ fontSize: 18, marginBottom: 6 }}>{st.icon}</div>
<div
style={{
fontSize: 12,
fontWeight: 600,
display: "block",
color:
style === st.id ? "#fff" : "rgba(255,255,255,0.55)",
}}
>
{st.label}
</div>
<div
style={{
fontSize: 10,
marginTop: 2,
color: "rgba(255,255,255,0.28)",
}}
>
{st.desc}
</div>
</button>
))}
</div>
</div>
<div
style={{
height: 1,
background: "rgba(255,255,255,0.06)",
margin: "0 20px",
}}
/>
{/* Voice speed */}
<div style={{ padding: "16px 20px" }}>
<div
style={{
display: "flex",
justifyContent: "space-between",
alignItems: "center",
marginBottom: 10,
}}
>
<label
style={{
fontSize: 10,
letterSpacing: "0.14em",
textTransform: "uppercase",
color: "rgba(255,255,255,0.38)",
}}
>
Voice Speed
</label>
<span style={{ fontSize: 12, fontWeight: 700, color: "#10b981" }}>
{speed.toFixed(1)}×
</span>
</div>
<input
type="range"
min={0.5}
max={2.0}
step={0.1}
value={speed}
onChange={(e) => setSpeed(parseFloat(e.target.value))}
disabled={loading}
style={{
width: "100%",
accentColor: "#10b981",
opacity: loading ? 0.4 : 1,
}}
/>
<div
style={{
display: "flex",
justifyContent: "space-between",
fontSize: 9,
color: "rgba(255,255,255,0.2)",
marginTop: 5,
}}
>
<span>0.5× slow</span>
<span>1.0× normal</span>
<span>2.0× fast</span>
</div>
</div>
{/* Generate button */}
<div style={{ padding: "0 20px 20px" }}>
<motion.button
onClick={handleGenerate}
disabled={!canSubmit}
whileTap={canSubmit ? { scale: 0.97 } : {}}
style={{
width: "100%",
padding: "14px",
borderRadius: 12,
border: "none",
background: canSubmit ? "#10b981" : "rgba(255,255,255,0.05)",
color: canSubmit ? "#000000" : "rgba(255,255,255,0.2)",
fontSize: 13,
fontWeight: 700,
fontFamily: "inherit",
cursor: canSubmit ? "pointer" : "not-allowed",
letterSpacing: "0.03em",
display: "flex",
alignItems: "center",
justifyContent: "center",
gap: 8,
transition: "all 0.18s",
}}
>
{loading ? (
<>
<Spinner />
Generating video…
</>
) : (
"Generate Video →"
)}
</motion.button>
</div>
</motion.div>
{/* ── Pipeline Progress ── */}
<AnimatePresence>
{loading && (
<motion.div
key="pipeline"
initial={{ opacity: 0, height: 0 }}
animate={{ opacity: 1, height: "auto" }}
exit={{ opacity: 0, height: 0 }}
transition={{ duration: 0.28 }}
style={{ overflow: "hidden", marginBottom: 12 }}
>
<div
style={{
borderRadius: 16,
border: "1px solid rgba(255,255,255,0.07)",
background: "rgba(255,255,255,0.02)",
padding: 20,
}}
>
<div
style={{
fontSize: 10,
letterSpacing: "0.14em",
textTransform: "uppercase",
color: "rgba(255,255,255,0.28)",
marginBottom: 18,
}}
>
Pipeline Progress
</div>
{STEPS.map((step, idx) => {
const isDone = pipeline.doneSteps.includes(step.id);
const isActive = pipeline.activeStep === idx;
return (
<motion.div
key={step.id}
initial={{ opacity: 0, x: -8 }}
animate={{ opacity: 1, x: 0 }}
transition={{ delay: idx * 0.06 }}
style={{
display: "flex",
alignItems: "flex-start",
gap: 12,
marginBottom: idx < STEPS.length - 1 ? 14 : 0,
}}
>
<div
style={{
width: 28,
height: 28,
borderRadius: 8,
flexShrink: 0,
marginTop: 2,
display: "flex",
alignItems: "center",
justifyContent: "center",
fontSize: 11,
border: isDone
? "1px solid rgba(16,185,129,0.5)"
: isActive
? "1px solid rgba(16,185,129,0.3)"
: "1px solid rgba(255,255,255,0.08)",
background: isDone
? "rgba(16,185,129,0.12)"
: isActive
? "rgba(16,185,129,0.06)"
: "transparent",
color: isDone
? "#10b981"
: isActive
? "#6ee7b7"
: "rgba(255,255,255,0.2)",
transition: "all 0.3s",
}}
>
{isDone ? "✓" : isActive ? <PulsingDot /> : step.icon}
</div>
<div>
<div
style={{
fontSize: 12,
fontWeight: 500,
marginBottom: 2,
color: isDone
? "#10b981"
: isActive
? "#ffffff"
: "rgba(255,255,255,0.25)",
transition: "color 0.3s",
}}
>
{step.label}
</div>
{isActive && (
<motion.div
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
style={{
fontSize: 10,
color: "rgba(255,255,255,0.3)",
lineHeight: 1.5,
}}
>
{step.sub}
</motion.div>
)}
</div>
</motion.div>
);
})}
</div>
</motion.div>
)}
</AnimatePresence>
{/* ── Error ── */}
<AnimatePresence>
{error && (
<motion.div
key="error"
initial={{ opacity: 0, y: 8 }}
animate={{ opacity: 1, y: 0 }}
exit={{ opacity: 0 }}
style={{
borderRadius: 14,
border: "1px solid rgba(239,68,68,0.3)",
background: "rgba(239,68,68,0.06)",
padding: 16,
marginBottom: 12,
display: "flex",
gap: 12,
}}
>
<span
style={{
color: "#f87171",
fontSize: 14,
marginTop: 1,
flexShrink: 0,
}}
>
⚠
</span>
<div>
<div
style={{
fontSize: 12,
fontWeight: 600,
color: "#f87171",
marginBottom: 4,
}}
>
Generation failed
</div>
<div
style={{
fontSize: 11,
color: "rgba(252,165,165,0.65)",
lineHeight: 1.5,
}}
>
{error}
</div>
<button
onClick={() => setError(null)}
style={{
background: "none",
border: "none",
cursor: "pointer",
fontFamily: "inherit",
fontSize: 10,
color: "rgba(248,113,113,0.6)",
marginTop: 8,
textDecoration: "underline",
padding: 0,
}}
>
Dismiss
</button>
</div>
</motion.div>
)}
</AnimatePresence>
{/* ── Result ── */}
<AnimatePresence>
{result && (
<motion.div
key="result"
initial={{ opacity: 0, y: 24, scale: 0.97 }}
animate={{ opacity: 1, y: 0, scale: 1 }}
transition={{ duration: 0.38, ease: [0.16, 1, 0.3, 1] }}
style={{
borderRadius: 16,
border: "1px solid rgba(16,185,129,0.28)",
background: "rgba(16,185,129,0.03)",
overflow: "hidden",
}}
>
{/* Header */}
<div
style={{
padding: "18px 20px 14px",
borderBottom: "1px solid rgba(255,255,255,0.06)",
}}
>
<div
style={{
display: "inline-flex",
alignItems: "center",
gap: 6,
fontSize: 9,
letterSpacing: "0.18em",
textTransform: "uppercase",
color: "rgba(16,185,129,0.75)",
border: "1px solid rgba(16,185,129,0.22)",
borderRadius: 999,
padding: "3px 10px",
marginBottom: 10,
}}
>
<span
style={{
width: 5,
height: 5,
borderRadius: "50%",
background: "#10b981",
display: "inline-block",
}}
/>
Ready · {result.slideCount} slides · {result.style}
</div>
<div
style={{
fontSize: 17,
fontWeight: 700,
color: "#ffffff",
letterSpacing: "-0.02em",
fontFamily: "'Syne',system-ui,sans-serif",
}}
>
{result.title}
</div>
</div>
{/* Video player */}
<div style={{ background: "#000" }}>
<video
src={result.videoUrl}
controls
autoPlay
style={{ width: "100%", display: "block", maxHeight: 340 }}
/>
</div>
{/* Narration */}
{result.narration && (
<div
style={{
padding: "14px 20px",
borderTop: "1px solid rgba(255,255,255,0.06)",
}}
>
<div
style={{
fontSize: 10,
letterSpacing: "0.14em",
textTransform: "uppercase",
color: "rgba(255,255,255,0.28)",
marginBottom: 8,
}}
>
Script / Narration
</div>
<p
style={{
fontSize: 11,
color: "rgba(255,255,255,0.4)",
lineHeight: 1.65,
display: "-webkit-box",
WebkitLineClamp: 4,
WebkitBoxOrient: "vertical",
overflow: "hidden",
}}
>
{result.narration}
</p>
</div>
)}
{/* Actions */}
<div
style={{
padding: "0 20px 20px",
display: "flex",
gap: 8,
flexWrap: "wrap",
alignItems: "center",
}}
>
<a
href={result.videoUrl}
download
style={{
display: "inline-flex",
alignItems: "center",
gap: 6,
fontSize: 12,
fontWeight: 600,
background: "#10b981",
color: "#000",
borderRadius: 10,
padding: "8px 16px",
textDecoration: "none",
fontFamily: "inherit",
}}
>
↓ Download MP4
</a>
{result.audioUrl && (
<a
href={result.audioUrl}
download
style={{
display: "inline-flex",
alignItems: "center",
gap: 6,
fontSize: 12,
fontWeight: 500,
border: "1px solid rgba(255,255,255,0.12)",
color: "rgba(255,255,255,0.55)",
borderRadius: 10,
padding: "8px 16px",
textDecoration: "none",
fontFamily: "inherit",
}}
>
↓ Audio MP3
</a>
)}
<button
onClick={() => {
setResult(null);
setPrompt("");
}}
style={{
marginLeft: "auto",
background: "none",
border: "none",
fontSize: 11,
color: "rgba(255,255,255,0.25)",
cursor: "pointer",
fontFamily: "inherit",
padding: "8px 4px",
}}
>
Generate another →
</button>
</div>
{/* Slide plan */}
{result.plan?.slides?.length > 0 && (
<div
style={{
borderTop: "1px solid rgba(255,255,255,0.06)",
padding: "14px 20px 20px",
}}
>
<div
style={{
fontSize: 10,
letterSpacing: "0.14em",
textTransform: "uppercase",
color: "rgba(255,255,255,0.28)",
marginBottom: 14,
}}
>
Slide Plan
</div>
{result.plan.slides.map((slide, i) => (
<div
key={i}
style={{
display: "flex",
gap: 12,
alignItems: "flex-start",
marginBottom: 10,
}}
>
<span
style={{
fontSize: 10,
color: "rgba(255,255,255,0.2)",
width: 20,
flexShrink: 0,
marginTop: 2,
fontVariantNumeric: "tabular-nums",
}}
>
{String(i + 1).padStart(2, "0")}
</span>
<div>
<div
style={{
fontSize: 12,
fontWeight: 500,
color: "rgba(255,255,255,0.68)",
}}
>
{slide.caption}
</div>
{slide.scriptLine && (
<div
style={{
fontSize: 10,
color: "rgba(255,255,255,0.28)",
marginTop: 2,
lineHeight: 1.5,
}}
>
{slide.scriptLine}
</div>
)}
</div>
</div>
))}
</div>
)}
</motion.div>
)}
</AnimatePresence>
{/* Footer */}
{!loading && !result && (
<motion.p
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
transition={{ delay: 0.55 }}
style={{
textAlign: "center",
fontSize: 10,
color: "rgba(255,255,255,0.14)",
marginTop: 28,
lineHeight: 1.8,
}}
>
Powered by OpenRouter · Remotion · Edge TTS · Firebase Storage
</motion.p>
)}
</div>
</div>
);
}
The above code is how to render the video once created. It contains the input to enter the prompt and a button that invokes the generate-video endpoint, and finally renders the video.
A few updates one can do is to load this generate-video API endpoint in the proper backend node or a Python-based backend module using honojs, expressjs or flask.
I would prefer doing this in Honojs, and given the same code below
// routes/generate-video.ts
// ─────────────────────────────────────────────────────────────────────────────
// POST /api/generate-video
//
// Hono.js route — drop into your existing Hono app:
// import { generateVideoRoute } from "./routes/generate-video";
// app.route("/api", generateVideoRoute);
//
// Body (JSON):
// prompt string required — user's natural language request
// userId string required — authenticated user ID
// draftId string optional — Firestore draft doc to backlink
// style string optional — "cinematic" | "minimal" | "documentary"
// voiceSpeed number optional — TTS speed 0.5–2.0 (default 1.0)
//
// Pipeline:
// 1. Validate input
// 2. OpenRouter (claude-3.5-sonnet) → video plan JSON
// 3. OpenRouter (flux-schnell) → images per slide (parallel)
// 4. Edge TTS → narration MP3 (non-fatal)
// 5. Remotion → render MP4
// 6. Firebase Storage → upload audio + video
// 7. Firestore batch → save video doc + backlink draft
// 8. Return result JSON
// ─────────────────────────────────────────────────────────────────────────────
import { Hono } from "hono";
import { env } from "hono/adapter";
import { HTTPException } from "hono/http-exception";
import { logger } from "hono/logger";
import { timing } from "hono/timing";
import fs from "fs/promises";
import os from "os";
import path from "path";
import { bundle } from "@remotion/bundler";
import { renderMedia, selectComposition } from "@remotion/renderer";
// Firebase Admin — import from your shared firebase-admin init file
// e.g. lib/firebase-admin.ts (see bottom of this file for setup snippet)
import { db, bucket } from "../lib/firebase-admin";
// ─── Types ───────────────────────────────────────────────────────────────────
interface RequestBody {
prompt: string;
userId: string;
draftId?: string;
style?: "cinematic" | "minimal" | "documentary";
voiceSpeed?: number;
}
interface SlideItem {
imagePrompt: string;
caption: string;
scriptLine?: string;
}
interface VideoPlan {
title: string;
narration: string;
slides: SlideItem[];
}
// ─── Constants ───────────────────────────────────────────────────────────────
const OPENROUTER_BASE = "https://openrouter.ai/api/v1";
const PLAN_MODEL = "anthropic/claude-3.5-sonnet";
const IMAGE_MODEL = "black-forest-labs/flux-schnell";
const MAX_SLIDES = 10;
const MIN_SLIDES = 3;
const VALID_STYLES = ["cinematic", "minimal", "documentary"] as const;
const STYLE_GUIDES: Record<string, string> = {
cinematic: "dramatic lighting, wide shots, film grain, 8k, professional photography, cinematic",
minimal: "clean white backgrounds, simple compositions, flat design, minimalist, high contrast",
documentary: "realistic, candid, natural lighting, photojournalism, editorial photography",
};
// ─── Helper: log with prefix ─────────────────────────────────────────────────
const log = (tag: string, msg: string) =>
console.log(`[video:${tag}] ${msg}`);
const warn = (tag: string, msg: string) =>
console.warn(`[video:${tag}] ⚠ ${msg}`);
// ─── Step 1: OpenRouter → Video Plan ─────────────────────────────────────────
async function generateVideoPlan(
prompt: string,
style: string,
openrouterKey: string,
appUrl: string,
): Promise<VideoPlan> {
const styleHint = STYLE_GUIDES[style] || STYLE_GUIDES.cinematic;
const systemPrompt = `You are a video producer AI. Given a user prompt, output a structured video plan.
Return ONLY valid JSON — no markdown, no backticks, no explanation.
Shape:
{
"title": "string (max 60 chars)",
"narration": "string — all scriptLines joined with a space (assembled automatically)",
"slides": [
{
"imagePrompt": "string (detailed image generation prompt, 40-80 words, style: ${styleHint})",
"caption": "string (max 8 words shown on screen)",
"scriptLine": "string (1-3 sentences of narration spoken during this slide)"
}
]
}
Rules:
- Generate between ${MIN_SLIDES} and ${MAX_SLIDES} slides
- Each imagePrompt must be visually rich and detailed
- scriptLines should flow naturally as spoken narration
- Set narration = all scriptLines joined with " "
- Style tone: ${style}`;
const response = await fetch(`${OPENROUTER_BASE}/chat/completions`, {
method: "POST",
headers: {
"Authorization": `Bearer ${openrouterKey}`,
"Content-Type": "application/json",
"HTTP-Referer": appUrl,
"X-Title": "Video Generator",
},
body: JSON.stringify({
model: PLAN_MODEL,
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: prompt },
],
temperature: 0.7,
max_tokens: 2500,
response_format: { type: "json_object" },
}),
});
if (!response.ok) {
const text = await response.text();
throw new HTTPException(502, {
message: `OpenRouter plan generation failed (${response.status}): ${text.slice(0, 200)}`,
});
}
const data = await response.json();
const raw = data.choices?.[0]?.message?.content as string | undefined;
if (!raw) {
throw new HTTPException(502, { message: "OpenRouter returned empty content for video plan" });
}
let plan: VideoPlan;
try {
plan = JSON.parse(raw);
} catch {
// Fallback: try to extract JSON block if model wrapped it
const match = raw.match(/\{[\s\S]*\}/);
if (!match) {
throw new HTTPException(502, { message: "Could not parse video plan JSON from OpenRouter" });
}
plan = JSON.parse(match[0]);
}
// Validate required fields
if (!plan.title || !plan.narration || !Array.isArray(plan.slides) || plan.slides.length === 0) {
throw new HTTPException(502, {
message: "Video plan JSON missing required fields (title, narration, slides)",
});
}
// Rebuild narration from scriptLines if available (more accurate)
const scriptLines = plan.slides.map(s => s.scriptLine).filter(Boolean);
if (scriptLines.length > 0) {
plan.narration = scriptLines.join(" ");
}
// Cap slides
plan.slides = plan.slides.slice(0, MAX_SLIDES);
return plan;
}
// ─── Step 2: OpenRouter → Generate Images ────────────────────────────────────
async function generateImages(
slides: SlideItem[],
style: string,
openrouterKey: string,
appUrl: string,
): Promise<string[]> {
const styleSuffix = STYLE_GUIDES[style] || STYLE_GUIDES.cinematic;
const results = await Promise.allSettled(
slides.map(async (slide, idx) => {
try {
const fullPrompt = `${slide.imagePrompt}, ${styleSuffix}`;
const response = await fetch(`${OPENROUTER_BASE}/images/generations`, {
method: "POST",
headers: {
"Authorization": `Bearer ${openrouterKey}`,
"Content-Type": "application/json",
"HTTP-Referer": appUrl,
"X-Title": "Video Generator",
},
body: JSON.stringify({
model: IMAGE_MODEL,
prompt: fullPrompt,
n: 1,
size: "1280x720",
}),
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
const data = await response.json();
const url = data.data?.[0]?.url as string | undefined;
if (!url) throw new Error("No URL in response");
log("images", `Slide ${idx + 1}/${slides.length} ✓`);
return url;
} catch (err: any) {
// Graceful fallback — video still renders with placeholder
warn("images", `Slide ${idx + 1} failed (${err?.message}), using placeholder`);
return `https://placehold.co/1280x720/0d1117/ffffff?text=${encodeURIComponent(slide.caption || `Slide ${idx + 1}`)}`;
}
}),
);
return results.map((r, idx) =>
r.status === "fulfilled"
? r.value
: `https://placehold.co/1280x720/0d1117/ffffff?text=Slide+${idx + 1}`,
);
}
// ─── Step 3: Edge TTS → Narration MP3 ────────────────────────────────────────
async function generateAudio(
title: string,
content: string,
speed: number,
): Promise<Buffer> {
// Dynamic ESM import
const { MsEdgeTTS, OUTPUT_FORMAT } = await import("edge-tts");
const tts = new MsEdgeTTS();
await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3);
const fullText = title ? `${title}. ${content}` : content;
// Escape XML special chars for SSML
const escaped = fullText.replace(/[<>&'"]/g, (c: string) =>
({ "<": "<", ">": ">", "&": "&", "'": "'", '"': """ }[c] ?? c),
);
const rateStr = speed >= 1
? `+${Math.round((speed - 1) * 100)}%`
: `-${Math.round((1 - speed) * 100)}%`;
const ssml = `<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-AriaNeural">
<prosody rate="${rateStr}">${escaped}</prosody>
</voice>
</speak>`;
const chunks: Buffer[] = [];
const readable = await tts.toStream(ssml);
return new Promise<Buffer>((resolve, reject) => {
readable.on("data", (chunk: Buffer) => chunks.push(chunk));
readable.on("end", () => resolve(Buffer.concat(chunks)));
readable.on("error", reject);
});
}
// ─── Step 4: Remotion → Render MP4 ───────────────────────────────────────────
async function renderSlideshow(params: {
images: string[];
title: string;
audioUrl: string | null;
captions: string[];
scriptLines: string[];
}): Promise<{ videoBuffer: Buffer; tempDir: string }> {
const { images, title, audioUrl, captions, scriptLines } = params;
const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
await fs.mkdir(tempDir, { recursive: true });
try {
const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
const totalDuration = images.length * perSlide;
const durationInFrames = Math.ceil(totalDuration * 30);
const bundleLocation = await bundle({
entryPoint: path.join(process.cwd(), "remotion/index.ts"),
webpackOverride: (config) => config,
});
const inputProps = {
images,
title,
audioUrl: audioUrl ?? "",
captions,
scriptLines,
perSlide,
};
const composition = await selectComposition({
serveUrl: bundleLocation,
id: "VideoComposition",
inputProps,
});
const outputPath = path.join(tempDir, "output.mp4");
await renderMedia({
composition,
serveUrl: bundleLocation,
codec: "h264",
outputLocation: outputPath,
inputProps,
concurrency: 4,
frameRange: [0, durationInFrames - 1],
});
const videoBuffer = await fs.readFile(outputPath);
return { videoBuffer, tempDir };
} catch (err) {
// Always clean up temp dir on render failure
await fs.rm(tempDir, { recursive: true, force: true });
throw err;
}
}
// ─── Step 5: Firebase Storage → Upload File ───────────────────────────────────
async function uploadToStorage(
buffer: Buffer,
filename: string,
contentType: string,
): Promise<string> {
const destination = `videos/${Date.now()}-${filename}`;
const file = bucket.file(destination);
await file.save(buffer, {
metadata: { contentType },
resumable: false, // fine for files <10MB; use resumable for larger
});
await file.makePublic();
return `https://storage.googleapis.com/${bucket.name}/${destination}`;
}
// ─── Step 6: Firestore → Save Video Doc + Backlink Draft ─────────────────────
async function saveVideoDoc(params: {
videoUrl: string;
audioUrl: string | null;
title: string;
userId: string;
draftId?: string;
plan: VideoPlan;
slideCount: number;
style: string;
}): Promise<string> {
const { videoUrl, audioUrl, title, userId, draftId, plan, slideCount, style } = params;
const now = new Date().toISOString();
const batch = db.batch();
// ── Video document ──────────────────────────────────────────────────────────
const videoRef = db.collection("videos").doc(); // auto-generated ID
batch.set(videoRef, {
id: videoRef.id,
title,
videoUrl,
audioUrl: audioUrl ?? null,
userId,
draftId: draftId ?? null,
slideCount,
style,
plan, // full plan stored for regeneration / display
status: "ready",
createdAt: now,
updatedAt: now,
});
// ── Backlink draft (if provided) ────────────────────────────────────────────
if (draftId) {
const draftRef = db.collection("drafts").doc(draftId);
batch.update(draftRef, {
videoId: videoRef.id,
videoUrl,
videoStatus: "rendered",
updatedAt: now,
});
}
await batch.commit();
return videoRef.id;
}
// ─── Hono Router ──────────────────────────────────────────────────────────────
export const generateVideoRoute = new Hono();
generateVideoRoute.use("*", logger());
generateVideoRoute.use("*", timing());
generateVideoRoute.post("/generate-video", async (c) => {
// Pull env vars via Hono adapter (works with Node, Bun, Cloudflare, etc.)
const {
OPENROUTER_API_KEY,
NEXT_PUBLIC_APP_URL,
APP_URL,
} = env<{
OPENROUTER_API_KEY: string;
NEXT_PUBLIC_APP_URL?: string;
APP_URL?: string;
}>(c);
const appUrl = NEXT_PUBLIC_APP_URL ?? APP_URL ?? "http://localhost:3000";
// ── Guard: API key must exist ────────────────────────────────────────────────
if (!OPENROUTER_API_KEY) {
throw new HTTPException(500, { message: "Server misconfiguration: OPENROUTER_API_KEY not set" });
}
// ── Parse + validate body ────────────────────────────────────────────────────
let body: RequestBody;
try {
body = await c.req.json<RequestBody>();
} catch {
throw new HTTPException(400, { message: "Request body must be valid JSON" });
}
const {
prompt,
userId,
draftId,
style = "cinematic",
voiceSpeed = 1.0,
} = body;
if (!userId || typeof userId !== "string" || !userId.trim()) {
throw new HTTPException(401, { message: "Authentication required." });
}
if (!prompt || typeof prompt !== "string" || !prompt.trim()) {
throw new HTTPException(400, { message: "A prompt is required to generate a video." });
}
if (prompt.trim().length < 10) {
throw new HTTPException(400, { message: "Prompt too short — minimum 10 characters." });
}
if (prompt.trim().length > 2000) {
throw new HTTPException(400, { message: "Prompt too long — maximum 2000 characters." });
}
const safeStyle = VALID_STYLES.includes(style as any) ? style : "cinematic";
const safeSpeed = Math.min(2.0, Math.max(0.5, Number(voiceSpeed) || 1.0));
log("handler", `User: ${userId} | Style: ${safeStyle} | Speed: ${safeSpeed}x`);
log("handler", `Prompt: "${prompt.trim().slice(0, 80)}…"`);
let tempDir: string | null = null;
try {
// ── Step 1: Generate video plan ────────────────────────────────────────────
log("step1", "Generating video plan via OpenRouter…");
const plan = await generateVideoPlan(prompt.trim(), safeStyle, OPENROUTER_API_KEY, appUrl);
log("step1", `Plan ready: "${plan.title}" — ${plan.slides.length} slides`);
// ── Step 2: Generate images ────────────────────────────────────────────────
log("step2", `Generating ${plan.slides.length} images…`);
const imageUrls = await generateImages(plan.slides, safeStyle, OPENROUTER_API_KEY, appUrl);
log("step2", `${imageUrls.length} images ready`);
// ── Step 3: Generate narration audio (non-fatal) ───────────────────────────
let audioUrl: string | null = null;
log("step3", "Generating narration audio via Edge TTS…");
try {
const audioBuffer = await generateAudio(plan.title, plan.narration, safeSpeed);
const audioFilename = `${Date.now()}-audio.mp3`;
audioUrl = await uploadToStorage(audioBuffer, audioFilename, "audio/mpeg");
log("step3", `Audio uploaded: ${audioUrl}`);
} catch (e: any) {
warn("step3", `Audio skipped (non-fatal): ${e?.message}`);
}
// ── Step 4: Render MP4 ─────────────────────────────────────────────────────
log("step4", `Rendering MP4 (${imageUrls.length} slides)…`);
const captions = plan.slides.map(s => s.caption ?? "");
const scriptLines = plan.slides.map(s => s.scriptLine ?? "");
const rendered = await renderSlideshow({
images: imageUrls,
title: plan.title,
audioUrl,
captions,
scriptLines,
});
tempDir = rendered.tempDir;
log("step4", "Render complete ✓");
// ── Step 5: Upload MP4 ─────────────────────────────────────────────────────
log("step5", "Uploading MP4 to Firebase Storage…");
const safeTitle = plan.title.replace(/[^a-z0-9]/gi, "-").toLowerCase().slice(0, 40);
const videoFilename = `${Date.now()}-${safeTitle}.mp4`;
const videoUrl = await uploadToStorage(rendered.videoBuffer, videoFilename, "video/mp4");
log("step5", `Video uploaded: ${videoUrl}`);
// ── Step 6: Save to Firestore ──────────────────────────────────────────────
log("step6", "Saving Firestore document…");
const docId = await saveVideoDoc({
videoUrl,
audioUrl,
title: plan.title,
userId,
draftId,
plan,
slideCount: imageUrls.length,
style: safeStyle,
});
log("step6", `Saved → docId: ${docId}`);
// ── Cleanup temp dir ───────────────────────────────────────────────────────
if (tempDir) {
await fs.rm(tempDir, { recursive: true, force: true });
tempDir = null;
}
log("handler", `Pipeline complete ✓ docId: ${docId}`);
// ── Success response ───────────────────────────────────────────────────────
return c.json({
success: true,
videoUrl,
audioUrl,
docId,
title: plan.title,
narration: plan.narration,
slideCount: imageUrls.length,
style: safeStyle,
plan,
}, 200);
} catch (err: any) {
// ── Always clean up on any failure ────────────────────────────────────────
if (tempDir) {
await fs.rm(tempDir, { recursive: true, force: true }).catch(() => {});
}
// Re-throw HTTPExceptions (already formatted)
if (err instanceof HTTPException) throw err;
// Classify other errors
console.error("[video:handler] Unhandled error:", err);
if (err?.message?.includes("OpenRouter")) {
throw new HTTPException(502, { message: `AI service error: ${err.message}` });
}
if (err?.message?.includes("render") || err?.message?.includes("Remotion")) {
throw new HTTPException(500, { message: `Video render failed: ${err.message}` });
}
throw new HTTPException(500, { message: err?.message ?? "Failed to generate video" });
}
});
// ─── Global error handler (attach to your main Hono app) ─────────────────────
// In your main app file:
//
// app.onError((err, c) => {
// if (err instanceof HTTPException) {
// return c.json({ error: err.message }, err.status);
// }
// console.error(err);
// return c.json({ error: "Internal server error" }, 500);
// });
For backend nodejs, firestore-admin is the npm you should use instead of simple Firebase
One can recreate the same endpoint using Prisma, Supabase, or Drizzle ORM as the database choice or MongoDB or PostgreSQL. Choice is yours, so choose accordingly.
That would be enough for today
One more thing, I've been working on Buildsaas.dev and Inkgest.com, my 2 other SaaS ideas. Do check the website and the idea, if you feel them useful, try them out, thanks in advance
Cheers
Shrey
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.