DEV Community

Cover image for Prompt to Video App API
shrey vijayvargiya
shrey vijayvargiya

Posted on

Prompt to Video App API

Prompt to Video App

Building AI video generation app

Tags: Remotion, Honojs, Nextjs, API

Hey there!!

Welcome to the new blog

In this blog, I'll be telling the step-by-step process to generate a video using AI LLM and the remotion npm module programmatically from just plain english

This is a good SaaS feature I was building for inkgest.com

Remotion is an npm module SDK that helps to create animated videos using a code language, mainly DOM elements such as Text, Image, Objects, and Videos

Last month, Remotion opened its SDK for Claude code to let Claude build the AI-generated video by simply converting plain english into Remotion-based code that generates output in mp4 to finally create the video

I like the concept because this simple technique can literally replace Adobe Premiere Pro, not on a large scale, but for small purposes, it will work better than Premiere PRO.

I'll be using OpenAI API or OpenRouter API, you can use anyother API key and the reason you will get in the end.

Remotion || OpenRouter

npx create-video@latest
Enter fullscreen mode Exit fullscreen mode

One command and your nextjs code repository with the remotion package is ready; it will generate a sample code as well.

But I'll be using a very simple Nextjs starter repository of my own, buildsaas.dev and follow the following steps to get started

  1. Install the Buildsaas.dev starter repository
  2. Add the remotion package, install dependencies using npm install
  3. Add OpenRouter API key for AI LLM

Next part is again an easy one to quickly add a server-side API for AI LLM to generate the remotion-based code, which we use the remotion SDK on the client side to render the video in mp4 format.

One server-side API, code given below, I'll explain later on.

/**
 * Video generation — no LLM, no Firecrawl re-scraping.
 *
 * Pipeline:
 *  1. Receive images[] + content + title from the frontend (already stored on the draft)
 *  2. Generate narration audio via msedge-tts (reads first ~400 chars of draft content)
 *  3. Render Remotion slideshow: one image per slide, Ken Burns effect, audio track
 *  4. Upload mp4 to UploadThing
 *  5. Save metadata to Firestore + backlink draft
 */

import { bundle } from "@remotion/bundler";
import { renderMedia, selectComposition } from "@remotion/renderer";
import { db } from "../../../lib/config/firebase";
import { collection, addDoc, doc, updateDoc, serverTimestamp } from "firebase/firestore";
import { UTApi } from "uploadthing/server";
import { MsEdgeTTS, OUTPUT_FORMAT } from "msedge-tts";
import path from "path";
import os from "os";
import fs from "fs/promises";

const utapi = new UTApi({ token: process.env.UPLOADTHING_SECRET });

/* ── Upload any buffer to UploadThing ── */
async function uploadFile({ buffer, filename, contentType }) {
    const file = new File([buffer], filename, { type: contentType });
    const response = await utapi.uploadFiles(file);
    if (response.error) throw new Error(response.error.message || "UploadThing upload failed");
    return response.data.url;
}

/* ── TTS: read draft content aloud via Edge TTS (free, no key) ── */
async function generateAudio({ title, content }) {
    // Narrate title + first meaningful paragraph of content (strip markdown)
    const plainText = content
        .replace(/!\[.*?\]\(.*?\)/g, "") // strip images
        .replace(/\[([^\]]+)\]\([^)]+\)/g, "$1") // flatten links
        .replace(/#{1,6}\s/g, "") // strip headings
        .replace(/[*_`>]/g, "") // strip formatting
        .replace(/\s+/g, " ")
        .trim();

    const narration = `${title}. ${plainText}`.slice(0, 500);

    const tts = new MsEdgeTTS();
    await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3);

    const tempFile = path.join(os.tmpdir(), `tts-${Date.now()}.mp3`);
    await tts.toFile(tempFile, narration);
    const buffer = await fs.readFile(tempFile);
    await fs.unlink(tempFile).catch(() => {});
    return buffer;
}

/* ── Remotion: render slideshow mp4 ── */
async function renderSlideshow({ images, title, audioUrl }) {
    const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
    await fs.mkdir(tempDir, { recursive: true });

    try {
        // Each image gets equal screen time; minimum 3s per slide, cap at 8s
        const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
        const totalDuration = images.length * perSlide;
        const durationInFrames = Math.ceil(totalDuration * 30);

        const bundleLocation = await bundle({
            entryPoint: path.join(process.cwd(), "remotion/index.ts"),
            webpackOverride: (config) => config,
        });

        const inputProps = {
            images,
            title,
            audioUrl: audioUrl || "",
            perSlide,
        };

        const composition = await selectComposition({
            serveUrl: bundleLocation,
            id: "VideoComposition",
            inputProps,
        });

        const outputPath = path.join(tempDir, "output.mp4");

        await renderMedia({
            composition,
            serveUrl: bundleLocation,
            codec: "h264",
            outputLocation: outputPath,
            inputProps,
            concurrency: 4,
            frameRange: [0, durationInFrames - 1],
        });

        const videoBuffer = await fs.readFile(outputPath);
        return { videoBuffer, tempDir };
    } catch (err) {
        await fs.rm(tempDir, { recursive: true, force: true });
        throw err;
    }
}

/* ── Firestore: save video doc + backlink draft ── */
async function saveVideoDoc({ videoUrl, audioUrl, title, userId, draftId }) {
    const docRef = await addDoc(collection(db, "videos"), {
        videoUrl,
        audioUrl: audioUrl || "",
        title,
        userId: userId || "anonymous",
        draftId: draftId || null,
        createdAt: serverTimestamp(),
        status: "completed",
    });

    if (draftId) {
        try {
            await updateDoc(doc(db, "drafts", draftId), {
                videoUrl,
                videoDocId: docRef.id,
            });
        } catch (e) {
            console.warn("[video] Failed to backlink draft:", e?.message);
        }
    }

    return docRef.id;
}

/* ── Handler ── */
export default async function handler(req, res) {
    if (req.method !== "POST") {
        return res.status(405).json({ error: "Method not allowed" });
    }

    try {
        const { images, title, content, userId, draftId } = req.body || {};

        if (!userId || typeof userId !== "string" || !userId.trim()) {
            return res.status(401).json({ error: "Authentication required." });
        }
        if (!content || !content.trim()) {
            return res.status(400).json({ error: "Content is required" });
        }
        if (!Array.isArray(images) || images.length === 0) {
            return res.status(400).json({ error: "At least one image is required to generate a slideshow video" });
        }

        // Valid HTTPS image URLs only
        const validImages = images
            .filter((u) => typeof u === "string" && /^https?:\/\//i.test(u))
            .slice(0, 15);

        if (validImages.length === 0) {
            return res.status(400).json({ error: "No valid image URLs found" });
        }

        // Step 1 — Generate narration audio from draft content (Edge TTS, free)
        let audioUrl = null;
        console.log("[video] Step 1: generating narration from draft content");
        try {
            const audioBuffer = await generateAudio({ title: title || "Draft", content });
            const audioFilename = `${Date.now()}-audio.mp3`;
            audioUrl = await uploadFile({ buffer: audioBuffer, filename: audioFilename, contentType: "audio/mpeg" });
            console.log("[video] Audio ready:", audioUrl);
        } catch (e) {
            console.warn("[video] Audio skipped:", e?.message);
        }

        // Step 2 — Render slideshow with Remotion
        console.log(`[video] Step 2: rendering slideshow (${validImages.length} images)`);
        const { videoBuffer, tempDir } = await renderSlideshow({
            images: validImages,
            title: title || "Draft",
            audioUrl,
        });

        // Step 3 — Upload mp4
        console.log("[video] Step 3: uploading video");
        const videoFilename = `${Date.now()}-${(title || "draft").replace(/[^a-z0-9]/gi, "-").slice(0, 40)}.mp4`;
        const videoUrl = await uploadFile({ buffer: videoBuffer, filename: videoFilename, contentType: "video/mp4" });

        // Step 4 — Save metadata + backlink draft
        console.log("[video] Step 4: saving metadata");
        const docId = await saveVideoDoc({ videoUrl, audioUrl, title: title || "Draft", userId, draftId });

        await fs.rm(tempDir, { recursive: true, force: true });

        return res.status(200).json({
            success: true,
            videoUrl,
            audioUrl,
            docId,
            title: title || "Draft",
            slideCount: validImages.length,
        });
    } catch (error) {
        console.error("[video] error:", error);
        return res.status(500).json({ error: error?.message || "Failed to generate video" });
    }
}
Enter fullscreen mode Exit fullscreen mode

The bundle will bundle the video, assemble them

renderMedia and selectComposition will do the same as their name explains, render the media like video and images and select how to compose the video.

Firebase is used to store the final mp4 in the storage and store the user credentials and mp4 link in the firestote

Uploadthing is an npm SDK for storing files on a server, mainly assets, similar to firestorage or AWS Storage

The rest are simple packages, such as the fs module to read files and the os module to read the operating system.

Let's break and explain

/* ── Remotion: render slideshow mp4 ── */
async function renderSlideshow({ images, title, audioUrl }) {
    const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
    await fs.mkdir(tempDir, { recursive: true });

    try {
        // Each image gets equal screen time; minimum 3s per slide, cap at 8s
        const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
        const totalDuration = images.length * perSlide;
        const durationInFrames = Math.ceil(totalDuration * 30);

        const bundleLocation = await bundle({
            entryPoint: path.join(process.cwd(), "remotion/index.ts"),
            webpackOverride: (config) => config,
        });

        const inputProps = {
            images,
            title,
            audioUrl: audioUrl || "",
            perSlide,
        };

        const composition = await selectComposition({
            serveUrl: bundleLocation,
            id: "VideoComposition",
            inputProps,
        });

        const outputPath = path.join(tempDir, "output.mp4");

        await renderMedia({
            composition,
            serveUrl: bundleLocation,
            codec: "h264",
            outputLocation: outputPath,
            inputProps,
            concurrency: 4,
            frameRange: [0, durationInFrames - 1],
        });

        const videoBuffer = await fs.readFile(outputPath);
        return { videoBuffer, tempDir };
    } catch (err) {
        await fs.rm(tempDir, { recursive: true, force: true });
        throw err;
    }
}
Enter fullscreen mode Exit fullscreen mode

First, we are creating a video and writing it into the nextjs repository as the output in mp4 format.

Above method is a Remotion video rendering function — it takes a list of images, a title, and an optional audio URL, and renders them into an .mp4 slideshow video programmatically (server-side, no browser needed).

The 5 steps it runs through

  1. Creates a temp folder
const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
Enter fullscreen mode Exit fullscreen mode

Unique scratch space like /tmp/video-1741234567 to write files during rendering. Cleaned up on error.


  1. Calculates slide timing
const perSlide = Math.min(8, Math.max(3, Math.floor(40 / images.length)));
Enter fullscreen mode Exit fullscreen mode

Targets a ~40 second total video. Each image gets equal time, clamped between 3s minimum and 8s maximum per slide. Then converts to frames at 30fps.


  1. Webpack-bundles your Remotion composition
await bundle({ entryPoint: "remotion/index.ts" })
Enter fullscreen mode Exit fullscreen mode

Your VideoComposition is a React component. Remotion bundles it with webpack so headless Chromium can execute it frame by frame.


  1. Renders every frame with headless Chromium
await renderMedia({ codec: "h264", concurrency: 4 })
Enter fullscreen mode Exit fullscreen mode

Spins up 4 parallel Chromium instances, screenshots every frame of the React component, then ffmpeg stitches them into an h264 .mp4.


  1. Returns the video as a Bufferjs
const videoBuffer = await fs.readFile(outputPath);
return { videoBuffer, tempDir };
Enter fullscreen mode Exit fullscreen mode

Caller gets the raw bytes to upload to S3/Firebase Storage. They're also responsible for deleting tempDir afterwards — that's intentional.

The last part is given below.

/* ── Handler ── */
export default async function handler(req, res) {
    if (req.method !== "POST") {
        return res.status(405).json({ error: "Method not allowed" });
    }

    try {
        const { images, title, content, userId, draftId } = req.body || {};

        if (!userId || typeof userId !== "string" || !userId.trim()) {
            return res.status(401).json({ error: "Authentication required." });
        }
        if (!content || !content.trim()) {
            return res.status(400).json({ error: "Content is required" });
        }
        if (!Array.isArray(images) || images.length === 0) {
            return res.status(400).json({ error: "At least one image is required to generate a slideshow video" });
        }

        // Valid HTTPS image URLs only
        const validImages = images
            .filter((u) => typeof u === "string" && /^https?:\/\//i.test(u))
            .slice(0, 15);

        if (validImages.length === 0) {
            return res.status(400).json({ error: "No valid image URLs found" });
        }

        // Step 1 — Generate narration audio from draft content (Edge TTS, free)
        let audioUrl = null;
        console.log("[video] Step 1: generating narration from draft content");
        try {
            const audioBuffer = await generateAudio({ title: title || "Draft", content });
            const audioFilename = `${Date.now()}-audio.mp3`;
            audioUrl = await uploadFile({ buffer: audioBuffer, filename: audioFilename, contentType: "audio/mpeg" });
            console.log("[video] Audio ready:", audioUrl);
        } catch (e) {
            console.warn("[video] Audio skipped:", e?.message);
        }

        // Step 2 — Render slideshow with Remotion
        console.log(`[video] Step 2: rendering slideshow (${validImages.length} images)`);
        const { videoBuffer, tempDir } = await renderSlideshow({
            images: validImages,
            title: title || "Draft",
            audioUrl,
        });

        // Step 3 — Upload mp4
        console.log("[video] Step 3: uploading video");
        const videoFilename = `${Date.now()}-${(title || "draft").replace(/[^a-z0-9]/gi, "-").slice(0, 40)}.mp4`;
        const videoUrl = await uploadFile({ buffer: videoBuffer, filename: videoFilename, contentType: "video/mp4" });

        // Step 4 — Save metadata + backlink draft
        console.log("[video] Step 4: saving metadata");
        const docId = await saveVideoDoc({ videoUrl, audioUrl, title: title || "Draft", userId, draftId });

        await fs.rm(tempDir, { recursive: true, force: true });

        return res.status(200).json({
            success: true,
            videoUrl,
            audioUrl,
            docId,
            title: title || "Draft",
            slideCount: validImages.length,
        });
    } catch (error) {
        console.error("[video] error:", error);
        return res.status(500).json({ error: error?.message || "Failed to generate video" });
    }
}
Enter fullscreen mode Exit fullscreen mode

This is the orchestrator; it validates the request, creates a video, stores the video in storage and firestore and returns the final response object.

The client side now just needs to make an API call to the above endpoint or function to get the video URL, which we can render inside an iframe or a video HTML element.

One question: why are we using a server-side API endpoint?

Because the client-side doesn't have fs module access, the browser can't read the code repository files; the browser needs an API to read and access the file system. NextJS server-side API runs on the server instead of the browser or client-side, hence giving us access to the fs module; the rest of the modules from the above method can be used in the client-side

If you are confused between client-side and server-side, then read more on chatgpt.

Creating a video using AI

We will follow the 6-step pipeline for creating a video

POST /api/generate-video
  { prompt, userId, draftId?, style?, voiceSpeed? }
        │
        ├─ Step 1: OpenRouter (claude-3.5-sonnet)
        │          prompt → { title, narration, slides[] }
        │
        ├─ Step 2: OpenRouter (flux-schnell)
        │          slides[].imagePrompt → image URLs (parallel, fallback to placeholder)
        │
        ├─ Step 3: Edge TTS (free, no key needed)
        │          narration → MP3 → Firebase Storage → audioUrl
        │          (non-fatal — video renders muted if this fails)
        │
        ├─ Step 4: Remotion
        │          images + audioUrl + captions → MP4 buffer
        │
        ├─ Step 5: Firebase Storage
        │          MP4 buffer → videoUrl
        │
        └─ Step 6: Firestore batch write
                   videos/{id} + drafts/{draftId} backlink
Enter fullscreen mode Exit fullscreen mode

Below is the code or server-side API endpoint, generate-video

// pages/api/generate-video.js
// ─────────────────────────────────────────────────────────────────────────────
// POST /api/generate-video
//
// Body:
//   prompt      string  required  — user's natural language request
//   userId      string  required  — authenticated user ID
//   draftId     string  optional  — Firestore draft to backlink
//   style       string  optional  — "cinematic" | "minimal" | "documentary" (default: "cinematic")
//   voiceSpeed  number  optional  — TTS speed 0.5–2.0 (default: 1.0)
//
// Flow:
//   1. Validate request
//   2. OpenRouter → generate structured video plan (title + narration + image prompts)
//   3. Generate images from prompts (via OpenRouter vision-capable model or placeholder)
//   4. Edge TTS → narration audio MP3
//   5. Remotion → render MP4 slideshow
//   6. Upload audio + video to Firebase Storage
//   7. Save Firestore doc + backlink draft
//   8. Return { videoUrl, audioUrl, docId, title, slideCount, plan }
// ─────────────────────────────────────────────────────────────────────────────

import fs from "fs/promises";
import os from "os";
import path from "path";
import { initializeApp, getApps, cert } from "firebase-admin/app";
import { getFirestore } from "firebase-admin/firestore";
import { getStorage } from "firebase-admin/storage";
import { bundle } from "@remotion/bundler";
import { renderMedia, selectComposition } from "@remotion/renderer";

// ─── Firebase Admin Init ─────────────────────────────────────────────────────
if (!getApps().length) {
  initializeApp({
    credential: cert({
      projectId: process.env.FIREBASE_PROJECT_ID,
      clientEmail: process.env.FIREBASE_CLIENT_EMAIL,
      privateKey: process.env.FIREBASE_PRIVATE_KEY?.replace(/\\n/g, "\n"),
    }),
    storageBucket: process.env.FIREBASE_STORAGE_BUCKET,
  });
}

const db = getFirestore();
const bucket = getStorage().bucket();

// ─── Constants ───────────────────────────────────────────────────────────────
const OPENROUTER_BASE = "https://openrouter.ai/api/v1";
const PLAN_MODEL = "anthropic/claude-3.5-sonnet";       // Best for structured JSON output
const IMAGE_MODEL = "black-forest-labs/flux-schnell";   // Fast image generation via OpenRouter
const MAX_SLIDES = 10;
const MIN_SLIDES = 3;

// ─── 1. OpenRouter: Generate Video Plan ──────────────────────────────────────
// Returns: { title, narration, slides: [{ imagePrompt, caption }] }
async function generateVideoPlan({ prompt, style = "cinematic" }) {
  const styleGuides = {
    cinematic:    "dramatic lighting, wide shots, film grain, professional photography",
    minimal:      "clean white backgrounds, simple compositions, flat design, minimalist",
    documentary:  "realistic, candid, natural lighting, photojournalism style",
  };

  const styleHint = styleGuides[style] || styleGuides.cinematic;

  const systemPrompt = `You are a video producer AI. Given a user's prompt, generate a structured video plan.
Return ONLY valid JSON — no markdown, no explanation, no backticks.

The JSON must match this exact shape:
{
  "title": "string (max 60 chars)",
  "narration": "string (150-300 words, will be converted to voiceover audio)",
  "slides": [
    {
      "imagePrompt": "string (detailed image generation prompt, ${styleHint})",
      "caption": "string (max 8 words, shown on screen)"
    }
  ]
}

Rules:
- Generate between ${MIN_SLIDES} and ${MAX_SLIDES} slides
- Each imagePrompt must be highly detailed and visual (40-80 words)
- Narration should flow naturally when read aloud
- Captions should be punchy and complement the image
- Style: ${style}`;

  const response = await fetch(`${OPENROUTER_BASE}/chat/completions`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.OPENROUTER_API_KEY}`,
      "Content-Type": "application/json",
      "HTTP-Referer": process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000",
      "X-Title": "BuildSaaS Video Generator",
    },
    body: JSON.stringify({
      model: PLAN_MODEL,
      messages: [
        { role: "system", content: systemPrompt },
        { role: "user", content: prompt },
      ],
      temperature: 0.7,
      max_tokens: 2000,
      response_format: { type: "json_object" }, // Force JSON mode
    }),
  });

  if (!response.ok) {
    const err = await response.text();
    throw new Error(`OpenRouter plan generation failed (${response.status}): ${err}`);
  }

  const data = await response.json();
  const raw = data.choices?.[0]?.message?.content;

  if (!raw) throw new Error("OpenRouter returned empty content for video plan");

  let plan;
  try {
    plan = JSON.parse(raw);
  } catch {
    // Attempt to extract JSON if model wrapped it
    const match = raw.match(/\{[\s\S]*\}/);
    if (!match) throw new Error("Could not parse video plan JSON from OpenRouter response");
    plan = JSON.parse(match[0]);
  }

  // Validate shape
  if (!plan.title || !plan.narration || !Array.isArray(plan.slides) || plan.slides.length === 0) {
    throw new Error("Video plan JSON is missing required fields (title, narration, slides)");
  }

  // Cap slides
  plan.slides = plan.slides.slice(0, MAX_SLIDES);

  return plan;
}

// ─── 2. OpenRouter: Generate Images from Prompts ─────────────────────────────
// Uses FLUX via OpenRouter's image generation endpoint.
// Falls back to a placeholder image URL if generation fails for any slide.
async function generateImages({ slides, style = "cinematic" }) {
  const styleGuides = {
    cinematic:   "cinematic, dramatic lighting, 8k, professional photography, film still",
    minimal:     "minimalist, clean, flat design, white background, simple composition",
    documentary: "documentary, realistic, natural lighting, photojournalism, candid",
  };
  const styleSuffix = styleGuides[style] || styleGuides.cinematic;

  const imageUrls = await Promise.allSettled(
    slides.map(async (slide, idx) => {
      try {
        const fullPrompt = `${slide.imagePrompt}, ${styleSuffix}`;

        const response = await fetch(`${OPENROUTER_BASE}/images/generations`, {
          method: "POST",
          headers: {
            "Authorization": `Bearer ${process.env.OPENROUTER_API_KEY}`,
            "Content-Type": "application/json",
            "HTTP-Referer": process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000",
            "X-Title": "BuildSaaS Video Generator",
          },
          body: JSON.stringify({
            model: IMAGE_MODEL,
            prompt: fullPrompt,
            n: 1,
            size: "1280x720", // 16:9 for video
          }),
        });

        if (!response.ok) {
          throw new Error(`Image generation failed for slide ${idx + 1}: ${response.status}`);
        }

        const data = await response.json();
        const url = data.data?.[0]?.url;
        if (!url) throw new Error(`No image URL returned for slide ${idx + 1}`);

        console.log(`[video] Image ${idx + 1}/${slides.length} generated`);
        return url;
      } catch (err) {
        // Graceful fallback: use a placeholder image so video still renders
        console.warn(`[video] Image ${idx + 1} failed, using placeholder:`, err.message);
        return `https://placehold.co/1280x720/1a1a2e/ffffff?text=${encodeURIComponent(slide.caption || `Slide ${idx + 1}`)}`;
      }
    })
  );

  // Extract values (allSettled means we always get something)
  return imageUrls.map((result, idx) =>
    result.status === "fulfilled"
      ? result.value
      : `https://placehold.co/1280x720/1a1a2e/ffffff?text=Slide+${idx + 1}`
  );
}

// ─── 3. Edge TTS: Generate Narration Audio ───────────────────────────────────
// Uses Microsoft Edge TTS (free, no API key needed) via the edge-tts npm package.
// Returns a Buffer of MP3 audio data.
async function generateAudio({ title, content, speed = 1.0 }) {
  // Dynamic import — edge-tts is ESM
  const { MsEdgeTTS, OUTPUT_FORMAT } = await import("edge-tts");

  const tts = new MsEdgeTTS();

  // en-US-AriaNeural is natural and works well for narration
  await tts.setMetadata(
    "en-US-AriaNeural",
    OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3,
  );

  // Prepend title as a natural intro
  const fullText = title ? `${title}. ${content}` : content;

  // Adjust speed via SSML rate tag
  const ssml = `<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
      <prosody rate="${speed >= 1 ? `+${Math.round((speed - 1) * 100)}%` : `-${Math.round((1 - speed) * 100)}%`}">
        ${fullText.replace(/[<>&'"]/g, (c) => ({ "<":"&lt;",">":"&gt;","&":"&amp;","'":"&apos;",'"':"&quot;" }[c]))}
      </prosody>
    </voice>
  </speak>`;

  const chunks = [];
  const readable = await tts.toStream(ssml);

  return new Promise((resolve, reject) => {
    readable.on("data", (chunk) => chunks.push(chunk));
    readable.on("end", () => resolve(Buffer.concat(chunks)));
    readable.on("error", reject);
  });
}

// ─── 4. Firebase Storage: Upload File ────────────────────────────────────────
// Uploads a Buffer and returns a public HTTPS URL.
async function uploadFile({ buffer, filename, contentType }) {
  const destination = `videos/${Date.now()}-${filename}`;
  const file = bucket.file(destination);

  await file.save(buffer, {
    metadata: { contentType },
    resumable: false,
  });

  // Make file publicly readable
  await file.makePublic();

  return `https://storage.googleapis.com/${bucket.name}/${destination}`;
}

// ─── 5. Remotion: Render Slideshow MP4 ───────────────────────────────────────
async function renderSlideshow({ images, title, audioUrl, captions = [] }) {
  const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
  await fs.mkdir(tempDir, { recursive: true });

  try {
    const perSlide = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
    const totalDuration = images.length * perSlide;
    const durationInFrames = Math.ceil(totalDuration * 30);

    const bundleLocation = await bundle({
      entryPoint: path.join(process.cwd(), "remotion/index.ts"),
      webpackOverride: (config) => config,
    });

    const inputProps = {
      images,
      title,
      audioUrl: audioUrl || "",
      captions,  // Pass captions through to composition
      perSlide,
    };

    const composition = await selectComposition({
      serveUrl: bundleLocation,
      id: "VideoComposition",
      inputProps,
    });

    const outputPath = path.join(tempDir, "output.mp4");

    await renderMedia({
      composition,
      serveUrl: bundleLocation,
      codec: "h264",
      outputLocation: outputPath,
      inputProps,
      concurrency: 4,
      frameRange: [0, durationInFrames - 1],
    });

    const videoBuffer = await fs.readFile(outputPath);
    return { videoBuffer, tempDir };
  } catch (err) {
    await fs.rm(tempDir, { recursive: true, force: true });
    throw err;
  }
}

// ─── 6. Firestore: Save Video Doc + Backlink Draft ───────────────────────────
async function saveVideoDoc({ videoUrl, audioUrl, title, userId, draftId, plan, slideCount, style }) {
  const now = new Date().toISOString();
  const batch = db.batch();

  // Video document
  const videoRef = db.collection("videos").doc();
  batch.set(videoRef, {
    id: videoRef.id,
    title,
    videoUrl,
    audioUrl: audioUrl || null,
    userId,
    draftId: draftId || null,
    slideCount,
    style,
    plan,          // Store the full plan for reference/regeneration
    status: "ready",
    createdAt: now,
    updatedAt: now,
  });

  // Backlink draft if provided
  if (draftId) {
    const draftRef = db.collection("drafts").doc(draftId);
    batch.update(draftRef, {
      videoId: videoRef.id,
      videoUrl,
      videoStatus: "rendered",
      updatedAt: now,
    });
  }

  await batch.commit();
  return videoRef.id;
}

// ─── Main Handler ─────────────────────────────────────────────────────────────
export default async function handler(req, res) {
  if (req.method !== "POST") {
    return res.status(405).json({ error: "Method not allowed" });
  }

  // ── Validate environment ──────────────────────────────────────────────────
  if (!process.env.OPENROUTER_API_KEY) {
    console.error("[video] OPENROUTER_API_KEY is not set");
    return res.status(500).json({ error: "Server configuration error: missing OpenRouter API key" });
  }

  let tempDir = null;

  try {
    const {
      prompt,
      userId,
      draftId,
      style = "cinematic",
      voiceSpeed = 1.0,
    } = req.body || {};

    // ── Input validation ────────────────────────────────────────────────────
    if (!userId || typeof userId !== "string" || !userId.trim()) {
      return res.status(401).json({ error: "Authentication required." });
    }

    if (!prompt || typeof prompt !== "string" || !prompt.trim()) {
      return res.status(400).json({ error: "A prompt is required to generate a video." });
    }

    if (prompt.trim().length < 10) {
      return res.status(400).json({ error: "Prompt is too short. Provide at least 10 characters." });
    }

    if (prompt.trim().length > 2000) {
      return res.status(400).json({ error: "Prompt is too long. Maximum 2000 characters." });
    }

    const validStyles = ["cinematic", "minimal", "documentary"];
    const safeStyle = validStyles.includes(style) ? style : "cinematic";

    const safeSpeed = Math.min(2.0, Math.max(0.5, Number(voiceSpeed) || 1.0));

    console.log(`[video] Starting pipeline for user ${userId}`);
    console.log(`[video] Prompt: "${prompt.slice(0, 80)}…"`);
    console.log(`[video] Style: ${safeStyle} | Speed: ${safeSpeed}`);

    // ── Step 1: Generate video plan via OpenRouter ──────────────────────────
    console.log("[video] Step 1: generating video plan via OpenRouter");
    const plan = await generateVideoPlan({ prompt: prompt.trim(), style: safeStyle });
    console.log(`[video] Plan ready: "${plan.title}" — ${plan.slides.length} slides`);

    // ── Step 2: Generate images for each slide ──────────────────────────────
    console.log(`[video] Step 2: generating ${plan.slides.length} images`);
    const imageUrls = await generateImages({ slides: plan.slides, style: safeStyle });
    console.log(`[video] ${imageUrls.length} images ready`);

    // ── Step 3: Generate narration audio (non-fatal) ────────────────────────
    let audioUrl = null;
    console.log("[video] Step 3: generating narration audio via Edge TTS");
    try {
      const audioBuffer = await generateAudio({
        title: plan.title,
        content: plan.narration,
        speed: safeSpeed,
      });
      const audioFilename = `${Date.now()}-audio.mp3`;
      audioUrl = await uploadFile({
        buffer: audioBuffer,
        filename: audioFilename,
        contentType: "audio/mpeg",
      });
      console.log("[video] Audio uploaded:", audioUrl);
    } catch (e) {
      console.warn("[video] Audio generation skipped (non-fatal):", e?.message);
    }

    // ── Step 4: Render slideshow with Remotion ──────────────────────────────
    console.log(`[video] Step 4: rendering MP4 (${imageUrls.length} slides)`);
    const captions = plan.slides.map((s) => s.caption || "");
    const result = await renderSlideshow({
      images: imageUrls,
      title: plan.title,
      audioUrl,
      captions,
    });
    tempDir = result.tempDir;
    const { videoBuffer } = result;
    console.log("[video] Render complete");

    // ── Step 5: Upload MP4 to Firebase Storage ──────────────────────────────
    console.log("[video] Step 5: uploading MP4");
    const safeTitle = plan.title.replace(/[^a-z0-9]/gi, "-").slice(0, 40).toLowerCase();
    const videoFilename = `${Date.now()}-${safeTitle}.mp4`;
    const videoUrl = await uploadFile({
      buffer: videoBuffer,
      filename: videoFilename,
      contentType: "video/mp4",
    });
    console.log("[video] Video uploaded:", videoUrl);

    // ── Step 6: Save Firestore doc + backlink draft ─────────────────────────
    console.log("[video] Step 6: saving Firestore metadata");
    const docId = await saveVideoDoc({
      videoUrl,
      audioUrl,
      title: plan.title,
      userId,
      draftId,
      plan,
      slideCount: imageUrls.length,
      style: safeStyle,
    });

    // ── Cleanup temp dir ────────────────────────────────────────────────────
    if (tempDir) {
      await fs.rm(tempDir, { recursive: true, force: true });
      tempDir = null;
    }

    console.log(`[video] Pipeline complete. docId: ${docId}`);

    return res.status(200).json({
      success: true,
      videoUrl,
      audioUrl,
      docId,
      title: plan.title,
      narration: plan.narration,
      slideCount: imageUrls.length,
      style: safeStyle,
      plan, // Full plan — useful for frontend to show captions/structure
    });

  } catch (error) {
    // Cleanup temp dir on any unhandled error
    if (tempDir) {
      await fs.rm(tempDir, { recursive: true, force: true }).catch(() => {});
    }

    console.error("[video] Pipeline error:", error);

    // Surface specific error types
    if (error.message?.includes("OpenRouter")) {
      return res.status(502).json({ error: "AI service error: " + error.message });
    }
    if (error.message?.includes("Remotion") || error.message?.includes("render")) {
      return res.status(500).json({ error: "Video rendering failed: " + error.message });
    }

    return res.status(500).json({ error: error?.message || "Failed to generate video" });
  }
}

// ─── Route Config ─────────────────────────────────────────────────────────────
// Required: disable Next.js body size limit for large payloads
export const config = {
  api: {
    bodyParser: {
      sizeLimit: "10mb",
    },
    responseLimit: false,       // Video responses can be large
    externalResolver: true,     // Suppress missing response warnings
  },
};
Enter fullscreen mode Exit fullscreen mode

Rendering Video in Browser

The last part is to render the video once it is generated and stored in the browser.

import { useState, useRef } from "react";
import { motion, AnimatePresence } from "framer-motion";

const STEPS = [
    {
        id: "plan",
        icon: "",
        label: "Writing video script",
        sub: "Claude generating title, narration & slide prompts",
    },
    {
        id: "images",
        icon: "",
        label: "Generating slide images",
        sub: "FLUX creating visuals for each scene",
    },
    {
        id: "audio",
        icon: "",
        label: "Synthesising narration",
        sub: "Edge TTS converting script to voiceover",
    },
    {
        id: "render",
        icon: "",
        label: "Rendering MP4",
        sub: "Remotion stitching frames into video",
    },
    {
        id: "save",
        icon: "",
        label: "Saving to library",
        sub: "Uploading to storage and saving metadata",
    },
];

const STYLES = [
    {
        id: "cinematic",
        label: "Cinematic",
        desc: "Dramatic · Film-grade",
        icon: "🎬",
    },
    { id: "minimal", label: "Minimal", desc: "Clean · Flat design", icon: "" },
    {
        id: "documentary",
        label: "Documentary",
        desc: "Realistic · Candid",
        icon: "📷",
    },
];

function usePipelineSteps() {
    const [activeStep, setActiveStep] = useState(-1);
    const [doneSteps, setDoneSteps] = useState([]);
    const timers = useRef([]);

    const start = () => {
        setActiveStep(0);
        setDoneSteps([]);
        const timings = [0, 4000, 14000, 19000, 38000];
        timings.forEach((delay, idx) => {
            const t = setTimeout(() => {
                setActiveStep(idx);
                if (idx > 0) setDoneSteps((d) => [...d, STEPS[idx - 1].id]);
            }, delay);
            timers.current.push(t);
        });
    };

    const finish = () => {
        timers.current.forEach(clearTimeout);
        setDoneSteps(STEPS.map((s) => s.id));
        setActiveStep(-1);
    };

    const reset = () => {
        timers.current.forEach(clearTimeout);
        setActiveStep(-1);
        setDoneSteps([]);
    };

    return { activeStep, doneSteps, start, finish, reset };
}

function Spinner() {
    return (
        <svg
            style={{ animation: "spin 1s linear infinite", width: 16, height: 16 }}
            viewBox="0 0 24 24"
            fill="none"
        >
            <style>{`@keyframes spin { to { transform: rotate(360deg); } } @keyframes pulse-dot { 0%,100%{opacity:1} 50%{opacity:0.2} }`}</style>
            <circle
                cx="12"
                cy="12"
                r="10"
                stroke="currentColor"
                strokeWidth="3"
                strokeOpacity="0.25"
            />
            <path fill="currentColor" d="M4 12a8 8 0 018-8v8H4z" />
        </svg>
    );
}

function PulsingDot() {
    return (
        <span
            style={{
                display: "block",
                width: 8,
                height: 8,
                borderRadius: "50%",
                background: "#10b981",
                animation: "pulse-dot 1.1s ease-in-out infinite",
            }}
        />
    );
}

export default function VideoGenerator({ userId = "demo-user" }) {
    const [prompt, setPrompt] = useState("");
    const [style, setStyle] = useState("cinematic");
    const [speed, setSpeed] = useState(1.0);
    const [loading, setLoading] = useState(false);
    const [result, setResult] = useState(null);
    const [error, setError] = useState(null);
    const pipeline = usePipelineSteps();

    const canSubmit = prompt.trim().length >= 10 && !loading;

    const handleGenerate = async () => {
        if (!canSubmit) return;
        setLoading(true);
        setResult(null);
        setError(null);
        pipeline.start();

        try {
            const res = await fetch("/api/generate-video", {
                method: "POST",
                headers: { "Content-Type": "application/json" },
                body: JSON.stringify({
                    prompt: prompt.trim(),
                    userId,
                    style,
                    voiceSpeed: speed,
                }),
            });
            const data = await res.json();
            if (!res.ok) throw new Error(data.error || "Generation failed");
            pipeline.finish();
            setResult(data);
        } catch (err) {
            pipeline.reset();
            setError(err.message || "Something went wrong");
        } finally {
            setLoading(false);
        }
    };

    return (
        <div
            style={{
                minHeight: "100vh",
                background: "#080a0d",
                color: "#e8eaed",
                position: "relative",
                overflowX: "hidden",
            }}
        >
            {/* Grid background */}
            <div
                style={{
                    position: "fixed",
                    inset: 0,
                    pointerEvents: "none",
                    zIndex: 0,
                    backgroundImage:
                        "linear-gradient(rgba(255,255,255,.022) 1px,transparent 1px),linear-gradient(90deg,rgba(255,255,255,.022) 1px,transparent 1px)",
                    backgroundSize: "48px 48px",
                }}
            />

            {/* Glow */}
            <div
                style={{
                    position: "fixed",
                    top: "-18%",
                    left: "50%",
                    transform: "translateX(-50%)",
                    width: 640,
                    height: 380,
                    borderRadius: "50%",
                    pointerEvents: "none",
                    zIndex: 0,
                    background:
                        "radial-gradient(ellipse,rgba(16,185,129,.16) 0%,transparent 70%)",
                }}
            />

            {/* Content */}
            <div
                style={{
                    position: "relative",
                    zIndex: 1,
                    maxWidth: 620,
                    margin: "0 auto",
                    padding: "60px 20px 80px",
                }}
            >
                {/* ── Header ── */}
                <motion.div
                    initial={{ opacity: 0, y: -14 }}
                    animate={{ opacity: 1, y: 0 }}
                    transition={{ duration: 0.45 }}
                    style={{ textAlign: "center", marginBottom: 40 }}
                >
                    <div
                        style={{
                            display: "inline-flex",
                            alignItems: "center",
                            gap: 8,
                            fontSize: 10,
                            letterSpacing: "0.18em",
                            textTransform: "uppercase",
                            color: "rgba(16,185,129,0.8)",
                            border: "1px solid rgba(16,185,129,0.25)",
                            borderRadius: 999,
                            padding: "5px 16px",
                            background: "rgba(16,185,129,0.06)",
                            marginBottom: 20,
                        }}
                    >
                        <span
                            style={{
                                width: 6,
                                height: 6,
                                borderRadius: "50%",
                                background: "#10b981",
                                display: "inline-block",
                            }}
                        />
                        AI · Remotion · Edge TTS
                    </div>
                    <h1
                        style={{
                            fontSize: 38,
                            fontWeight: 800,
                            letterSpacing: "-0.04em",
                            color: "#ffffff",
                            lineHeight: 1,
                            marginBottom: 12,
                            fontFamily: "'Syne','DM Sans',system-ui,sans-serif",
                        }}
                    >
                        Prompt  Video
                    </h1>
                    <p
                        style={{
                            fontSize: 13,
                            color: "rgba(255,255,255,0.38)",
                            lineHeight: 1.6,
                        }}
                    >
                        Describe anything. Get a narrated MP4 slideshow in ~60 seconds.
                    </p>
                </motion.div>

                {/* ── Input Card ── */}
                <motion.div
                    initial={{ opacity: 0, y: 18 }}
                    animate={{ opacity: 1, y: 0 }}
                    transition={{ duration: 0.45, delay: 0.08 }}
                    style={{
                        borderRadius: 16,
                        border: "1px solid rgba(255,255,255,0.08)",
                        background: "rgba(255,255,255,0.03)",
                        backdropFilter: "blur(12px)",
                        overflow: "hidden",
                        marginBottom: 12,
                    }}
                >
                    {/* Prompt */}
                    <div style={{ padding: "20px 20px 14px" }}>
                        <label
                            style={{
                                display: "block",
                                fontSize: 10,
                                letterSpacing: "0.14em",
                                textTransform: "uppercase",
                                color: "rgba(255,255,255,0.38)",
                                marginBottom: 10,
                            }}
                        >
                            Your Prompt
                        </label>
                        <textarea
                            value={prompt}
                            onChange={(e) => setPrompt(e.target.value)}
                            disabled={loading}
                            rows={4}
                            placeholder="e.g. A documentary about solo founders who quit their jobs to build software products alone and became successful..."
                            style={{
                                width: "100%",
                                background: "transparent",
                                border: "none",
                                outline: "none",
                                resize: "none",
                                fontSize: 13,
                                color: "rgba(255,255,255,0.82)",
                                lineHeight: 1.65,
                                fontFamily: "inherit",
                                opacity: loading ? 0.4 : 1,
                            }}
                        />
                        <div
                            style={{
                                display: "flex",
                                justifyContent: "space-between",
                                marginTop: 6,
                            }}
                        >
                            <span
                                style={{
                                    fontSize: 10,
                                    color:
                                        prompt.length > 1800 ? "#f87171" : "rgba(255,255,255,0.2)",
                                }}
                            >
                                {prompt.length} / 2000
                            </span>
                            {prompt.length > 0 && prompt.length < 10 && (
                                <span style={{ fontSize: 10, color: "rgba(251,191,36,0.7)" }}>
                                    Need {10 - prompt.length} more chars
                                </span>
                            )}
                        </div>
                    </div>

                    <div
                        style={{
                            height: 1,
                            background: "rgba(255,255,255,0.06)",
                            margin: "0 20px",
                        }}
                    />

                    {/* Style selector */}
                    <div style={{ padding: "16px 20px" }}>
                        <label
                            style={{
                                display: "block",
                                fontSize: 10,
                                letterSpacing: "0.14em",
                                textTransform: "uppercase",
                                color: "rgba(255,255,255,0.38)",
                                marginBottom: 12,
                            }}
                        >
                            Visual Style
                        </label>
                        <div
                            style={{
                                display: "grid",
                                gridTemplateColumns: "repeat(3,1fr)",
                                gap: 8,
                            }}
                        >
                            {STYLES.map((st) => (
                                <button
                                    key={st.id}
                                    onClick={() => !loading && setStyle(st.id)}
                                    disabled={loading}
                                    style={{
                                        borderRadius: 12,
                                        padding: "12px 10px",
                                        textAlign: "left",
                                        cursor: loading ? "not-allowed" : "pointer",
                                        border:
                                            style === st.id
                                                ? "1px solid rgba(16,185,129,0.5)"
                                                : "1px solid rgba(255,255,255,0.08)",
                                        background:
                                            style === st.id
                                                ? "rgba(16,185,129,0.1)"
                                                : "rgba(255,255,255,0.02)",
                                        color:
                                            style === st.id ? "#ffffff" : "rgba(255,255,255,0.4)",
                                        transition: "all 0.18s",
                                        opacity: loading ? 0.5 : 1,
                                    }}
                                >
                                    <div style={{ fontSize: 18, marginBottom: 6 }}>{st.icon}</div>
                                    <div
                                        style={{
                                            fontSize: 12,
                                            fontWeight: 600,
                                            display: "block",
                                            color:
                                                style === st.id ? "#fff" : "rgba(255,255,255,0.55)",
                                        }}
                                    >
                                        {st.label}
                                    </div>
                                    <div
                                        style={{
                                            fontSize: 10,
                                            marginTop: 2,
                                            color: "rgba(255,255,255,0.28)",
                                        }}
                                    >
                                        {st.desc}
                                    </div>
                                </button>
                            ))}
                        </div>
                    </div>

                    <div
                        style={{
                            height: 1,
                            background: "rgba(255,255,255,0.06)",
                            margin: "0 20px",
                        }}
                    />

                    {/* Voice speed */}
                    <div style={{ padding: "16px 20px" }}>
                        <div
                            style={{
                                display: "flex",
                                justifyContent: "space-between",
                                alignItems: "center",
                                marginBottom: 10,
                            }}
                        >
                            <label
                                style={{
                                    fontSize: 10,
                                    letterSpacing: "0.14em",
                                    textTransform: "uppercase",
                                    color: "rgba(255,255,255,0.38)",
                                }}
                            >
                                Voice Speed
                            </label>
                            <span style={{ fontSize: 12, fontWeight: 700, color: "#10b981" }}>
                                {speed.toFixed(1)}×
                            </span>
                        </div>
                        <input
                            type="range"
                            min={0.5}
                            max={2.0}
                            step={0.1}
                            value={speed}
                            onChange={(e) => setSpeed(parseFloat(e.target.value))}
                            disabled={loading}
                            style={{
                                width: "100%",
                                accentColor: "#10b981",
                                opacity: loading ? 0.4 : 1,
                            }}
                        />
                        <div
                            style={{
                                display: "flex",
                                justifyContent: "space-between",
                                fontSize: 9,
                                color: "rgba(255,255,255,0.2)",
                                marginTop: 5,
                            }}
                        >
                            <span>0.5× slow</span>
                            <span>1.0× normal</span>
                            <span>2.0× fast</span>
                        </div>
                    </div>

                    {/* Generate button */}
                    <div style={{ padding: "0 20px 20px" }}>
                        <motion.button
                            onClick={handleGenerate}
                            disabled={!canSubmit}
                            whileTap={canSubmit ? { scale: 0.97 } : {}}
                            style={{
                                width: "100%",
                                padding: "14px",
                                borderRadius: 12,
                                border: "none",
                                background: canSubmit ? "#10b981" : "rgba(255,255,255,0.05)",
                                color: canSubmit ? "#000000" : "rgba(255,255,255,0.2)",
                                fontSize: 13,
                                fontWeight: 700,
                                fontFamily: "inherit",
                                cursor: canSubmit ? "pointer" : "not-allowed",
                                letterSpacing: "0.03em",
                                display: "flex",
                                alignItems: "center",
                                justifyContent: "center",
                                gap: 8,
                                transition: "all 0.18s",
                            }}
                        >
                            {loading ? (
                                <>
                                    <Spinner />
                                    Generating video
                                </>
                            ) : (
                                "Generate Video →"
                            )}
                        </motion.button>
                    </div>
                </motion.div>

                {/* ── Pipeline Progress ── */}
                <AnimatePresence>
                    {loading && (
                        <motion.div
                            key="pipeline"
                            initial={{ opacity: 0, height: 0 }}
                            animate={{ opacity: 1, height: "auto" }}
                            exit={{ opacity: 0, height: 0 }}
                            transition={{ duration: 0.28 }}
                            style={{ overflow: "hidden", marginBottom: 12 }}
                        >
                            <div
                                style={{
                                    borderRadius: 16,
                                    border: "1px solid rgba(255,255,255,0.07)",
                                    background: "rgba(255,255,255,0.02)",
                                    padding: 20,
                                }}
                            >
                                <div
                                    style={{
                                        fontSize: 10,
                                        letterSpacing: "0.14em",
                                        textTransform: "uppercase",
                                        color: "rgba(255,255,255,0.28)",
                                        marginBottom: 18,
                                    }}
                                >
                                    Pipeline Progress
                                </div>
                                {STEPS.map((step, idx) => {
                                    const isDone = pipeline.doneSteps.includes(step.id);
                                    const isActive = pipeline.activeStep === idx;
                                    return (
                                        <motion.div
                                            key={step.id}
                                            initial={{ opacity: 0, x: -8 }}
                                            animate={{ opacity: 1, x: 0 }}
                                            transition={{ delay: idx * 0.06 }}
                                            style={{
                                                display: "flex",
                                                alignItems: "flex-start",
                                                gap: 12,
                                                marginBottom: idx < STEPS.length - 1 ? 14 : 0,
                                            }}
                                        >
                                            <div
                                                style={{
                                                    width: 28,
                                                    height: 28,
                                                    borderRadius: 8,
                                                    flexShrink: 0,
                                                    marginTop: 2,
                                                    display: "flex",
                                                    alignItems: "center",
                                                    justifyContent: "center",
                                                    fontSize: 11,
                                                    border: isDone
                                                        ? "1px solid rgba(16,185,129,0.5)"
                                                        : isActive
                                                            ? "1px solid rgba(16,185,129,0.3)"
                                                            : "1px solid rgba(255,255,255,0.08)",
                                                    background: isDone
                                                        ? "rgba(16,185,129,0.12)"
                                                        : isActive
                                                            ? "rgba(16,185,129,0.06)"
                                                            : "transparent",
                                                    color: isDone
                                                        ? "#10b981"
                                                        : isActive
                                                            ? "#6ee7b7"
                                                            : "rgba(255,255,255,0.2)",
                                                    transition: "all 0.3s",
                                                }}
                                            >
                                                {isDone ? "" : isActive ? <PulsingDot /> : step.icon}
                                            </div>
                                            <div>
                                                <div
                                                    style={{
                                                        fontSize: 12,
                                                        fontWeight: 500,
                                                        marginBottom: 2,
                                                        color: isDone
                                                            ? "#10b981"
                                                            : isActive
                                                                ? "#ffffff"
                                                                : "rgba(255,255,255,0.25)",
                                                        transition: "color 0.3s",
                                                    }}
                                                >
                                                    {step.label}
                                                </div>
                                                {isActive && (
                                                    <motion.div
                                                        initial={{ opacity: 0 }}
                                                        animate={{ opacity: 1 }}
                                                        style={{
                                                            fontSize: 10,
                                                            color: "rgba(255,255,255,0.3)",
                                                            lineHeight: 1.5,
                                                        }}
                                                    >
                                                        {step.sub}
                                                    </motion.div>
                                                )}
                                            </div>
                                        </motion.div>
                                    );
                                })}
                            </div>
                        </motion.div>
                    )}
                </AnimatePresence>

                {/* ── Error ── */}
                <AnimatePresence>
                    {error && (
                        <motion.div
                            key="error"
                            initial={{ opacity: 0, y: 8 }}
                            animate={{ opacity: 1, y: 0 }}
                            exit={{ opacity: 0 }}
                            style={{
                                borderRadius: 14,
                                border: "1px solid rgba(239,68,68,0.3)",
                                background: "rgba(239,68,68,0.06)",
                                padding: 16,
                                marginBottom: 12,
                                display: "flex",
                                gap: 12,
                            }}
                        >
                            <span
                                style={{
                                    color: "#f87171",
                                    fontSize: 14,
                                    marginTop: 1,
                                    flexShrink: 0,
                                }}
                            >
                                
                            </span>
                            <div>
                                <div
                                    style={{
                                        fontSize: 12,
                                        fontWeight: 600,
                                        color: "#f87171",
                                        marginBottom: 4,
                                    }}
                                >
                                    Generation failed
                                </div>
                                <div
                                    style={{
                                        fontSize: 11,
                                        color: "rgba(252,165,165,0.65)",
                                        lineHeight: 1.5,
                                    }}
                                >
                                    {error}
                                </div>
                                <button
                                    onClick={() => setError(null)}
                                    style={{
                                        background: "none",
                                        border: "none",
                                        cursor: "pointer",
                                        fontFamily: "inherit",
                                        fontSize: 10,
                                        color: "rgba(248,113,113,0.6)",
                                        marginTop: 8,
                                        textDecoration: "underline",
                                        padding: 0,
                                    }}
                                >
                                    Dismiss
                                </button>
                            </div>
                        </motion.div>
                    )}
                </AnimatePresence>

                {/* ── Result ── */}
                <AnimatePresence>
                    {result && (
                        <motion.div
                            key="result"
                            initial={{ opacity: 0, y: 24, scale: 0.97 }}
                            animate={{ opacity: 1, y: 0, scale: 1 }}
                            transition={{ duration: 0.38, ease: [0.16, 1, 0.3, 1] }}
                            style={{
                                borderRadius: 16,
                                border: "1px solid rgba(16,185,129,0.28)",
                                background: "rgba(16,185,129,0.03)",
                                overflow: "hidden",
                            }}
                        >
                            {/* Header */}
                            <div
                                style={{
                                    padding: "18px 20px 14px",
                                    borderBottom: "1px solid rgba(255,255,255,0.06)",
                                }}
                            >
                                <div
                                    style={{
                                        display: "inline-flex",
                                        alignItems: "center",
                                        gap: 6,
                                        fontSize: 9,
                                        letterSpacing: "0.18em",
                                        textTransform: "uppercase",
                                        color: "rgba(16,185,129,0.75)",
                                        border: "1px solid rgba(16,185,129,0.22)",
                                        borderRadius: 999,
                                        padding: "3px 10px",
                                        marginBottom: 10,
                                    }}
                                >
                                    <span
                                        style={{
                                            width: 5,
                                            height: 5,
                                            borderRadius: "50%",
                                            background: "#10b981",
                                            display: "inline-block",
                                        }}
                                    />
                                    Ready · {result.slideCount} slides · {result.style}
                                </div>
                                <div
                                    style={{
                                        fontSize: 17,
                                        fontWeight: 700,
                                        color: "#ffffff",
                                        letterSpacing: "-0.02em",
                                        fontFamily: "'Syne',system-ui,sans-serif",
                                    }}
                                >
                                    {result.title}
                                </div>
                            </div>

                            {/* Video player */}
                            <div style={{ background: "#000" }}>
                                <video
                                    src={result.videoUrl}
                                    controls
                                    autoPlay
                                    style={{ width: "100%", display: "block", maxHeight: 340 }}
                                />
                            </div>

                            {/* Narration */}
                            {result.narration && (
                                <div
                                    style={{
                                        padding: "14px 20px",
                                        borderTop: "1px solid rgba(255,255,255,0.06)",
                                    }}
                                >
                                    <div
                                        style={{
                                            fontSize: 10,
                                            letterSpacing: "0.14em",
                                            textTransform: "uppercase",
                                            color: "rgba(255,255,255,0.28)",
                                            marginBottom: 8,
                                        }}
                                    >
                                        Script / Narration
                                    </div>
                                    <p
                                        style={{
                                            fontSize: 11,
                                            color: "rgba(255,255,255,0.4)",
                                            lineHeight: 1.65,
                                            display: "-webkit-box",
                                            WebkitLineClamp: 4,
                                            WebkitBoxOrient: "vertical",
                                            overflow: "hidden",
                                        }}
                                    >
                                        {result.narration}
                                    </p>
                                </div>
                            )}

                            {/* Actions */}
                            <div
                                style={{
                                    padding: "0 20px 20px",
                                    display: "flex",
                                    gap: 8,
                                    flexWrap: "wrap",
                                    alignItems: "center",
                                }}
                            >
                                <a
                                    href={result.videoUrl}
                                    download
                                    style={{
                                        display: "inline-flex",
                                        alignItems: "center",
                                        gap: 6,
                                        fontSize: 12,
                                        fontWeight: 600,
                                        background: "#10b981",
                                        color: "#000",
                                        borderRadius: 10,
                                        padding: "8px 16px",
                                        textDecoration: "none",
                                        fontFamily: "inherit",
                                    }}
                                >
                                     Download MP4
                                </a>
                                {result.audioUrl && (
                                    <a
                                        href={result.audioUrl}
                                        download
                                        style={{
                                            display: "inline-flex",
                                            alignItems: "center",
                                            gap: 6,
                                            fontSize: 12,
                                            fontWeight: 500,
                                            border: "1px solid rgba(255,255,255,0.12)",
                                            color: "rgba(255,255,255,0.55)",
                                            borderRadius: 10,
                                            padding: "8px 16px",
                                            textDecoration: "none",
                                            fontFamily: "inherit",
                                        }}
                                    >
                                         Audio MP3
                                    </a>
                                )}
                                <button
                                    onClick={() => {
                                        setResult(null);
                                        setPrompt("");
                                    }}
                                    style={{
                                        marginLeft: "auto",
                                        background: "none",
                                        border: "none",
                                        fontSize: 11,
                                        color: "rgba(255,255,255,0.25)",
                                        cursor: "pointer",
                                        fontFamily: "inherit",
                                        padding: "8px 4px",
                                    }}
                                >
                                    Generate another 
                                </button>
                            </div>

                            {/* Slide plan */}
                            {result.plan?.slides?.length > 0 && (
                                <div
                                    style={{
                                        borderTop: "1px solid rgba(255,255,255,0.06)",
                                        padding: "14px 20px 20px",
                                    }}
                                >
                                    <div
                                        style={{
                                            fontSize: 10,
                                            letterSpacing: "0.14em",
                                            textTransform: "uppercase",
                                            color: "rgba(255,255,255,0.28)",
                                            marginBottom: 14,
                                        }}
                                    >
                                        Slide Plan
                                    </div>
                                    {result.plan.slides.map((slide, i) => (
                                        <div
                                            key={i}
                                            style={{
                                                display: "flex",
                                                gap: 12,
                                                alignItems: "flex-start",
                                                marginBottom: 10,
                                            }}
                                        >
                                            <span
                                                style={{
                                                    fontSize: 10,
                                                    color: "rgba(255,255,255,0.2)",
                                                    width: 20,
                                                    flexShrink: 0,
                                                    marginTop: 2,
                                                    fontVariantNumeric: "tabular-nums",
                                                }}
                                            >
                                                {String(i + 1).padStart(2, "0")}
                                            </span>
                                            <div>
                                                <div
                                                    style={{
                                                        fontSize: 12,
                                                        fontWeight: 500,
                                                        color: "rgba(255,255,255,0.68)",
                                                    }}
                                                >
                                                    {slide.caption}
                                                </div>
                                                {slide.scriptLine && (
                                                    <div
                                                        style={{
                                                            fontSize: 10,
                                                            color: "rgba(255,255,255,0.28)",
                                                            marginTop: 2,
                                                            lineHeight: 1.5,
                                                        }}
                                                    >
                                                        {slide.scriptLine}
                                                    </div>
                                                )}
                                            </div>
                                        </div>
                                    ))}
                                </div>
                            )}
                        </motion.div>
                    )}
                </AnimatePresence>

                {/* Footer */}
                {!loading && !result && (
                    <motion.p
                        initial={{ opacity: 0 }}
                        animate={{ opacity: 1 }}
                        transition={{ delay: 0.55 }}
                        style={{
                            textAlign: "center",
                            fontSize: 10,
                            color: "rgba(255,255,255,0.14)",
                            marginTop: 28,
                            lineHeight: 1.8,
                        }}
                    >
                        Powered by OpenRouter · Remotion · Edge TTS · Firebase Storage
                    </motion.p>
                )}
            </div>
        </div>
    );
}

Enter fullscreen mode Exit fullscreen mode

The above code is how to render the video once created. It contains the input to enter the prompt and a button that invokes the generate-video endpoint, and finally renders the video.

A few updates one can do is to load this generate-video API endpoint in the proper backend node or a Python-based backend module using honojs, expressjs or flask.

I would prefer doing this in Honojs, and given the same code below

// routes/generate-video.ts
// ─────────────────────────────────────────────────────────────────────────────
// POST /api/generate-video
//
// Hono.js route — drop into your existing Hono app:
//   import { generateVideoRoute } from "./routes/generate-video";
//   app.route("/api", generateVideoRoute);
//
// Body (JSON):
//   prompt      string   required  — user's natural language request
//   userId      string   required  — authenticated user ID
//   draftId     string   optional  — Firestore draft doc to backlink
//   style       string   optional  — "cinematic" | "minimal" | "documentary"
//   voiceSpeed  number   optional  — TTS speed 0.5–2.0 (default 1.0)
//
// Pipeline:
//   1. Validate input
//   2. OpenRouter (claude-3.5-sonnet) → video plan JSON
//   3. OpenRouter (flux-schnell)       → images per slide (parallel)
//   4. Edge TTS                        → narration MP3 (non-fatal)
//   5. Remotion                        → render MP4
//   6. Firebase Storage                → upload audio + video
//   7. Firestore batch                 → save video doc + backlink draft
//   8. Return result JSON
// ─────────────────────────────────────────────────────────────────────────────

import { Hono }            from "hono";
import { env }             from "hono/adapter";
import { HTTPException }   from "hono/http-exception";
import { logger }          from "hono/logger";
import { timing }          from "hono/timing";
import fs                  from "fs/promises";
import os                  from "os";
import path                from "path";
import { bundle }          from "@remotion/bundler";
import { renderMedia, selectComposition } from "@remotion/renderer";

// Firebase Admin — import from your shared firebase-admin init file
// e.g. lib/firebase-admin.ts  (see bottom of this file for setup snippet)
import { db, bucket }      from "../lib/firebase-admin";

// ─── Types ───────────────────────────────────────────────────────────────────
interface RequestBody {
  prompt:     string;
  userId:     string;
  draftId?:   string;
  style?:     "cinematic" | "minimal" | "documentary";
  voiceSpeed?: number;
}

interface SlideItem {
  imagePrompt: string;
  caption:     string;
  scriptLine?: string;
}

interface VideoPlan {
  title:     string;
  narration: string;
  slides:    SlideItem[];
}

// ─── Constants ───────────────────────────────────────────────────────────────
const OPENROUTER_BASE = "https://openrouter.ai/api/v1";
const PLAN_MODEL      = "anthropic/claude-3.5-sonnet";
const IMAGE_MODEL     = "black-forest-labs/flux-schnell";
const MAX_SLIDES      = 10;
const MIN_SLIDES      = 3;

const VALID_STYLES    = ["cinematic", "minimal", "documentary"] as const;

const STYLE_GUIDES: Record<string, string> = {
  cinematic:   "dramatic lighting, wide shots, film grain, 8k, professional photography, cinematic",
  minimal:     "clean white backgrounds, simple compositions, flat design, minimalist, high contrast",
  documentary: "realistic, candid, natural lighting, photojournalism, editorial photography",
};

// ─── Helper: log with prefix ─────────────────────────────────────────────────
const log = (tag: string, msg: string) =>
  console.log(`[video:${tag}] ${msg}`);

const warn = (tag: string, msg: string) =>
  console.warn(`[video:${tag}] ⚠ ${msg}`);

// ─── Step 1: OpenRouter → Video Plan ─────────────────────────────────────────
async function generateVideoPlan(
  prompt: string,
  style: string,
  openrouterKey: string,
  appUrl: string,
): Promise<VideoPlan> {
  const styleHint = STYLE_GUIDES[style] || STYLE_GUIDES.cinematic;

  const systemPrompt = `You are a video producer AI. Given a user prompt, output a structured video plan.
Return ONLY valid JSON — no markdown, no backticks, no explanation.

Shape:
{
  "title": "string (max 60 chars)",
  "narration": "string — all scriptLines joined with a space (assembled automatically)",
  "slides": [
    {
      "imagePrompt": "string (detailed image generation prompt, 40-80 words, style: ${styleHint})",
      "caption": "string (max 8 words shown on screen)",
      "scriptLine": "string (1-3 sentences of narration spoken during this slide)"
    }
  ]
}

Rules:
- Generate between ${MIN_SLIDES} and ${MAX_SLIDES} slides
- Each imagePrompt must be visually rich and detailed
- scriptLines should flow naturally as spoken narration
- Set narration = all scriptLines joined with " "
- Style tone: ${style}`;

  const response = await fetch(`${OPENROUTER_BASE}/chat/completions`, {
    method: "POST",
    headers: {
      "Authorization":  `Bearer ${openrouterKey}`,
      "Content-Type":   "application/json",
      "HTTP-Referer":   appUrl,
      "X-Title":        "Video Generator",
    },
    body: JSON.stringify({
      model:           PLAN_MODEL,
      messages: [
        { role: "system", content: systemPrompt },
        { role: "user",   content: prompt },
      ],
      temperature:     0.7,
      max_tokens:      2500,
      response_format: { type: "json_object" },
    }),
  });

  if (!response.ok) {
    const text = await response.text();
    throw new HTTPException(502, {
      message: `OpenRouter plan generation failed (${response.status}): ${text.slice(0, 200)}`,
    });
  }

  const data = await response.json();
  const raw  = data.choices?.[0]?.message?.content as string | undefined;

  if (!raw) {
    throw new HTTPException(502, { message: "OpenRouter returned empty content for video plan" });
  }

  let plan: VideoPlan;

  try {
    plan = JSON.parse(raw);
  } catch {
    // Fallback: try to extract JSON block if model wrapped it
    const match = raw.match(/\{[\s\S]*\}/);
    if (!match) {
      throw new HTTPException(502, { message: "Could not parse video plan JSON from OpenRouter" });
    }
    plan = JSON.parse(match[0]);
  }

  // Validate required fields
  if (!plan.title || !plan.narration || !Array.isArray(plan.slides) || plan.slides.length === 0) {
    throw new HTTPException(502, {
      message: "Video plan JSON missing required fields (title, narration, slides)",
    });
  }

  // Rebuild narration from scriptLines if available (more accurate)
  const scriptLines = plan.slides.map(s => s.scriptLine).filter(Boolean);
  if (scriptLines.length > 0) {
    plan.narration = scriptLines.join(" ");
  }

  // Cap slides
  plan.slides = plan.slides.slice(0, MAX_SLIDES);

  return plan;
}

// ─── Step 2: OpenRouter → Generate Images ────────────────────────────────────
async function generateImages(
  slides: SlideItem[],
  style: string,
  openrouterKey: string,
  appUrl: string,
): Promise<string[]> {
  const styleSuffix = STYLE_GUIDES[style] || STYLE_GUIDES.cinematic;

  const results = await Promise.allSettled(
    slides.map(async (slide, idx) => {
      try {
        const fullPrompt = `${slide.imagePrompt}, ${styleSuffix}`;

        const response = await fetch(`${OPENROUTER_BASE}/images/generations`, {
          method: "POST",
          headers: {
            "Authorization": `Bearer ${openrouterKey}`,
            "Content-Type":  "application/json",
            "HTTP-Referer":  appUrl,
            "X-Title":       "Video Generator",
          },
          body: JSON.stringify({
            model:  IMAGE_MODEL,
            prompt: fullPrompt,
            n:      1,
            size:   "1280x720",
          }),
        });

        if (!response.ok) {
          throw new Error(`HTTP ${response.status}`);
        }

        const data = await response.json();
        const url  = data.data?.[0]?.url as string | undefined;

        if (!url) throw new Error("No URL in response");

        log("images", `Slide ${idx + 1}/${slides.length} ✓`);
        return url;
      } catch (err: any) {
        // Graceful fallback — video still renders with placeholder
        warn("images", `Slide ${idx + 1} failed (${err?.message}), using placeholder`);
        return `https://placehold.co/1280x720/0d1117/ffffff?text=${encodeURIComponent(slide.caption || `Slide ${idx + 1}`)}`;
      }
    }),
  );

  return results.map((r, idx) =>
    r.status === "fulfilled"
      ? r.value
      : `https://placehold.co/1280x720/0d1117/ffffff?text=Slide+${idx + 1}`,
  );
}

// ─── Step 3: Edge TTS → Narration MP3 ────────────────────────────────────────
async function generateAudio(
  title: string,
  content: string,
  speed: number,
): Promise<Buffer> {
  // Dynamic ESM import
  const { MsEdgeTTS, OUTPUT_FORMAT } = await import("edge-tts");

  const tts = new MsEdgeTTS();
  await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3);

  const fullText = title ? `${title}. ${content}` : content;

  // Escape XML special chars for SSML
  const escaped = fullText.replace(/[<>&'"]/g, (c: string) =>
    ({ "<": "&lt;", ">": "&gt;", "&": "&amp;", "'": "&apos;", '"': "&quot;" }[c] ?? c),
  );

  const rateStr = speed >= 1
    ? `+${Math.round((speed - 1) * 100)}%`
    : `-${Math.round((1 - speed) * 100)}%`;

  const ssml = `<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
      <prosody rate="${rateStr}">${escaped}</prosody>
    </voice>
  </speak>`;

  const chunks: Buffer[] = [];
  const readable = await tts.toStream(ssml);

  return new Promise<Buffer>((resolve, reject) => {
    readable.on("data",  (chunk: Buffer) => chunks.push(chunk));
    readable.on("end",   () => resolve(Buffer.concat(chunks)));
    readable.on("error", reject);
  });
}

// ─── Step 4: Remotion → Render MP4 ───────────────────────────────────────────
async function renderSlideshow(params: {
  images:   string[];
  title:    string;
  audioUrl: string | null;
  captions: string[];
  scriptLines: string[];
}): Promise<{ videoBuffer: Buffer; tempDir: string }> {
  const { images, title, audioUrl, captions, scriptLines } = params;

  const tempDir = path.join(os.tmpdir(), `video-${Date.now()}`);
  await fs.mkdir(tempDir, { recursive: true });

  try {
    const perSlide        = Math.min(8, Math.max(3, Math.floor(40 / Math.max(images.length, 1))));
    const totalDuration   = images.length * perSlide;
    const durationInFrames = Math.ceil(totalDuration * 30);

    const bundleLocation = await bundle({
      entryPoint:     path.join(process.cwd(), "remotion/index.ts"),
      webpackOverride: (config) => config,
    });

    const inputProps = {
      images,
      title,
      audioUrl:    audioUrl ?? "",
      captions,
      scriptLines,
      perSlide,
    };

    const composition = await selectComposition({
      serveUrl:   bundleLocation,
      id:         "VideoComposition",
      inputProps,
    });

    const outputPath = path.join(tempDir, "output.mp4");

    await renderMedia({
      composition,
      serveUrl:        bundleLocation,
      codec:           "h264",
      outputLocation:  outputPath,
      inputProps,
      concurrency:     4,
      frameRange:      [0, durationInFrames - 1],
    });

    const videoBuffer = await fs.readFile(outputPath);
    return { videoBuffer, tempDir };
  } catch (err) {
    // Always clean up temp dir on render failure
    await fs.rm(tempDir, { recursive: true, force: true });
    throw err;
  }
}

// ─── Step 5: Firebase Storage → Upload File ───────────────────────────────────
async function uploadToStorage(
  buffer:      Buffer,
  filename:    string,
  contentType: string,
): Promise<string> {
  const destination = `videos/${Date.now()}-${filename}`;
  const file        = bucket.file(destination);

  await file.save(buffer, {
    metadata:  { contentType },
    resumable: false,           // fine for files <10MB; use resumable for larger
  });

  await file.makePublic();

  return `https://storage.googleapis.com/${bucket.name}/${destination}`;
}

// ─── Step 6: Firestore → Save Video Doc + Backlink Draft ─────────────────────
async function saveVideoDoc(params: {
  videoUrl:   string;
  audioUrl:   string | null;
  title:      string;
  userId:     string;
  draftId?:   string;
  plan:       VideoPlan;
  slideCount: number;
  style:      string;
}): Promise<string> {
  const { videoUrl, audioUrl, title, userId, draftId, plan, slideCount, style } = params;
  const now   = new Date().toISOString();
  const batch = db.batch();

  // ── Video document ──────────────────────────────────────────────────────────
  const videoRef = db.collection("videos").doc();   // auto-generated ID

  batch.set(videoRef, {
    id:         videoRef.id,
    title,
    videoUrl,
    audioUrl:   audioUrl ?? null,
    userId,
    draftId:    draftId ?? null,
    slideCount,
    style,
    plan,                       // full plan stored for regeneration / display
    status:     "ready",
    createdAt:  now,
    updatedAt:  now,
  });

  // ── Backlink draft (if provided) ────────────────────────────────────────────
  if (draftId) {
    const draftRef = db.collection("drafts").doc(draftId);
    batch.update(draftRef, {
      videoId:     videoRef.id,
      videoUrl,
      videoStatus: "rendered",
      updatedAt:   now,
    });
  }

  await batch.commit();

  return videoRef.id;
}

// ─── Hono Router ──────────────────────────────────────────────────────────────
export const generateVideoRoute = new Hono();

generateVideoRoute.use("*", logger());
generateVideoRoute.use("*", timing());

generateVideoRoute.post("/generate-video", async (c) => {
  // Pull env vars via Hono adapter (works with Node, Bun, Cloudflare, etc.)
  const {
    OPENROUTER_API_KEY,
    NEXT_PUBLIC_APP_URL,
    APP_URL,
  } = env<{
    OPENROUTER_API_KEY:   string;
    NEXT_PUBLIC_APP_URL?: string;
    APP_URL?:             string;
  }>(c);

  const appUrl = NEXT_PUBLIC_APP_URL ?? APP_URL ?? "http://localhost:3000";

  // ── Guard: API key must exist ────────────────────────────────────────────────
  if (!OPENROUTER_API_KEY) {
    throw new HTTPException(500, { message: "Server misconfiguration: OPENROUTER_API_KEY not set" });
  }

  // ── Parse + validate body ────────────────────────────────────────────────────
  let body: RequestBody;

  try {
    body = await c.req.json<RequestBody>();
  } catch {
    throw new HTTPException(400, { message: "Request body must be valid JSON" });
  }

  const {
    prompt,
    userId,
    draftId,
    style      = "cinematic",
    voiceSpeed = 1.0,
  } = body;

  if (!userId || typeof userId !== "string" || !userId.trim()) {
    throw new HTTPException(401, { message: "Authentication required." });
  }

  if (!prompt || typeof prompt !== "string" || !prompt.trim()) {
    throw new HTTPException(400, { message: "A prompt is required to generate a video." });
  }

  if (prompt.trim().length < 10) {
    throw new HTTPException(400, { message: "Prompt too short — minimum 10 characters." });
  }

  if (prompt.trim().length > 2000) {
    throw new HTTPException(400, { message: "Prompt too long — maximum 2000 characters." });
  }

  const safeStyle = VALID_STYLES.includes(style as any) ? style : "cinematic";
  const safeSpeed = Math.min(2.0, Math.max(0.5, Number(voiceSpeed) || 1.0));

  log("handler", `User: ${userId} | Style: ${safeStyle} | Speed: ${safeSpeed}x`);
  log("handler", `Prompt: "${prompt.trim().slice(0, 80)}…"`);

  let tempDir: string | null = null;

  try {
    // ── Step 1: Generate video plan ────────────────────────────────────────────
    log("step1", "Generating video plan via OpenRouter…");
    const plan = await generateVideoPlan(prompt.trim(), safeStyle, OPENROUTER_API_KEY, appUrl);
    log("step1", `Plan ready: "${plan.title}" — ${plan.slides.length} slides`);

    // ── Step 2: Generate images ────────────────────────────────────────────────
    log("step2", `Generating ${plan.slides.length} images…`);
    const imageUrls = await generateImages(plan.slides, safeStyle, OPENROUTER_API_KEY, appUrl);
    log("step2", `${imageUrls.length} images ready`);

    // ── Step 3: Generate narration audio (non-fatal) ───────────────────────────
    let audioUrl: string | null = null;
    log("step3", "Generating narration audio via Edge TTS…");

    try {
      const audioBuffer   = await generateAudio(plan.title, plan.narration, safeSpeed);
      const audioFilename = `${Date.now()}-audio.mp3`;
      audioUrl = await uploadToStorage(audioBuffer, audioFilename, "audio/mpeg");
      log("step3", `Audio uploaded: ${audioUrl}`);
    } catch (e: any) {
      warn("step3", `Audio skipped (non-fatal): ${e?.message}`);
    }

    // ── Step 4: Render MP4 ─────────────────────────────────────────────────────
    log("step4", `Rendering MP4 (${imageUrls.length} slides)…`);
    const captions    = plan.slides.map(s => s.caption    ?? "");
    const scriptLines = plan.slides.map(s => s.scriptLine ?? "");

    const rendered = await renderSlideshow({
      images: imageUrls,
      title:  plan.title,
      audioUrl,
      captions,
      scriptLines,
    });

    tempDir = rendered.tempDir;
    log("step4", "Render complete ✓");

    // ── Step 5: Upload MP4 ─────────────────────────────────────────────────────
    log("step5", "Uploading MP4 to Firebase Storage…");
    const safeTitle     = plan.title.replace(/[^a-z0-9]/gi, "-").toLowerCase().slice(0, 40);
    const videoFilename = `${Date.now()}-${safeTitle}.mp4`;
    const videoUrl      = await uploadToStorage(rendered.videoBuffer, videoFilename, "video/mp4");
    log("step5", `Video uploaded: ${videoUrl}`);

    // ── Step 6: Save to Firestore ──────────────────────────────────────────────
    log("step6", "Saving Firestore document…");
    const docId = await saveVideoDoc({
      videoUrl,
      audioUrl,
      title:      plan.title,
      userId,
      draftId,
      plan,
      slideCount: imageUrls.length,
      style:      safeStyle,
    });
    log("step6", `Saved → docId: ${docId}`);

    // ── Cleanup temp dir ───────────────────────────────────────────────────────
    if (tempDir) {
      await fs.rm(tempDir, { recursive: true, force: true });
      tempDir = null;
    }

    log("handler", `Pipeline complete ✓ docId: ${docId}`);

    // ── Success response ───────────────────────────────────────────────────────
    return c.json({
      success:    true,
      videoUrl,
      audioUrl,
      docId,
      title:      plan.title,
      narration:  plan.narration,
      slideCount: imageUrls.length,
      style:      safeStyle,
      plan,
    }, 200);

  } catch (err: any) {
    // ── Always clean up on any failure ────────────────────────────────────────
    if (tempDir) {
      await fs.rm(tempDir, { recursive: true, force: true }).catch(() => {});
    }

    // Re-throw HTTPExceptions (already formatted)
    if (err instanceof HTTPException) throw err;

    // Classify other errors
    console.error("[video:handler] Unhandled error:", err);

    if (err?.message?.includes("OpenRouter")) {
      throw new HTTPException(502, { message: `AI service error: ${err.message}` });
    }

    if (err?.message?.includes("render") || err?.message?.includes("Remotion")) {
      throw new HTTPException(500, { message: `Video render failed: ${err.message}` });
    }

    throw new HTTPException(500, { message: err?.message ?? "Failed to generate video" });
  }
});

// ─── Global error handler (attach to your main Hono app) ─────────────────────
// In your main app file:
//
//   app.onError((err, c) => {
//     if (err instanceof HTTPException) {
//       return c.json({ error: err.message }, err.status);
//     }
//     console.error(err);
//     return c.json({ error: "Internal server error" }, 500);
//   });
Enter fullscreen mode Exit fullscreen mode

For backend nodejs, firestore-admin is the npm you should use instead of simple Firebase

One can recreate the same endpoint using Prisma, Supabase, or Drizzle ORM as the database choice or MongoDB or PostgreSQL. Choice is yours, so choose accordingly.

That would be enough for today

One more thing, I've been working on Buildsaas.dev and Inkgest.com, my 2 other SaaS ideas. Do check the website and the idea, if you feel them useful, try them out, thanks in advance

Cheers

Shrey

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.