Static images were working fine. Users were happy. Revenue was growing. So naturally, I decided to break everything.
I run RepoClip, a SaaS that turns GitHub repos into promotional videos. The pipeline analyzes code with Gemini, generates scene images, adds AI narration, and renders the final video with Remotion. The previous article covered switching the image model from FLUX.2 to Nano Banana 2 — a 6.7x cost increase that turned out to be noise.
This time, I went bigger. Instead of still images, I wanted each scene to be an AI-generated video clip. Five 5-second clips stitched together with narration. The kind of output that makes people say "wait, AI made this?"
The model: Kling 3.0 Pro via Fal.ai's queue API.
The result: it works — beautifully. But getting there nearly broke me.
Why Video Clips?
The still-image pipeline was solid. Nano Banana 2 produces gorgeous frames. But promo videos with static images and Ken Burns zoom feel like... slideshows. Because they are.
AI video generation has matured enough that 5-second clips look cinematic. Camera pans, particle effects, dynamic lighting — things you'd need After Effects for a year ago. For a tool that generates promotional content, this changes the value proposition entirely.
The plan was simple: add a "Video Short" content mode alongside the existing image mode. Five scenes, five 5-second AI video clips, same narration pipeline, same Remotion renderer.
Simple plan. Five production incidents.
The Cost Math
Before writing code, I ran the numbers. Kling 3.0 Pro costs $0.224 per second on Fal.ai. Each clip is 5 seconds.
| Image Mode | Video Short | |
|---|---|---|
| Scenes | 6 | 5 |
| Visual cost | $0.08 × 6 = $0.48 | $1.12 × 5 = $5.60 |
| TTS cost | ~$0.10 | ~$0.08 |
| Render cost | ~$0.05 | ~$0.05 |
| Total per video | ~$0.63 | ~$5.73 |
| RepoClip credits | 10 | 100 |
At $0.01/credit (Starter plan pricing), video mode generates $1.00 revenue per video against $5.73 cost. That's underwater.
But this is a premium feature. At Pro plan pricing ($0.004/credit × 100 = $0.40), it's even more underwater. The economics only work at scale with Agency plan users, or as a differentiator that drives subscription upgrades.
I decided to launch it anyway. The quality gap between a slideshow and cinematic AI clips is the kind of thing that converts free users to paid. Sometimes the feature pays for itself indirectly.
Incident #1: The 5-Minute Wall
First deploy. First test. Video generation starts, reaches "Generating Assets"... and stays there for 15 minutes. Then the whole pipeline dies silently.
The culprit: Vercel's serverless function timeout.
My existing image pipeline used fal.subscribe() — a convenience method that submits a job and long-polls until completion:
// This worked for images (7-30 seconds)
const result = await fal.subscribe("fal-ai/nano-banana-2", {
input: { prompt, aspect_ratio: "16:9" },
});
fal.subscribe blocks the HTTP connection, waiting for the result. For images that generate in 30 seconds, this is fine. For video clips that take 18 minutes, it's a death sentence.
Vercel Pro plan has a hard limit: 300 seconds (5 minutes) per function invocation. The function gets killed mid-poll. No error. No cleanup. Just gone.
The Architecture Fix: Submit-Sleep-Collect
The solution was to stop waiting. Instead of long-polling inside a single function call, I split the work into three phases:
Submit all jobs (fast, <10s)
↓
step.sleep("15m") ← Inngest manages this, zero Vercel resources
↓
Collect results (fast HTTP GETs)
This is where Inngest saved the architecture. Inngest step functions let you step.sleep() between operations. During the sleep, no Vercel function is running. No compute. No timeout risk. Inngest wakes your function up after the sleep and runs the next step as a fresh HTTP request.
// Phase 1: Submit all video jobs to fal.ai queue (<10 seconds)
const videoJobs = await step.run("submit-video-jobs", async () => {
return await submitVideoJobs(videoConfig.scenes, aspectRatio);
});
// Phase 2: Sleep while fal.ai processes (no Vercel resources used)
await step.sleep("wait-for-videos", "15m");
// Phase 3: Collect results (fast HTTP GETs)
for (let attempt = 0; attempt < 10; attempt++) {
const { completed, pending } = await step.run(
`collect-video-results-${attempt}`,
async () => collectVideoResults(remainingJobs)
);
// ... handle completed/pending
if (pending.length === 0) break;
await step.sleep(`wait-${attempt}`, "2m");
}
The key insight: each step.run() is a separate Vercel function invocation. As long as each individual step completes within 5 minutes, the overall pipeline can run for hours.
Incident #2: The 18-Minute Surprise
After implementing the submit-sleep-collect pattern, I deployed and tested. The pipeline submitted 5 jobs, slept for 3 minutes, then checked results.
All 5 clips: still processing.
I waited. Checked again at 5 minutes. Still processing. 8 minutes. Still processing.
I started a timer and polled manually with curl:
# Submit a test job
curl -X POST "https://queue.fal.run/fal-ai/kling-video/v2/master/text-to-video" \
-H "Authorization: Key $FAL_KEY" \
-d '{"prompt":"Futuristic data visualization dashboard","duration":"5"}'
# Poll every 2 minutes...
# IN_PROGRESS... IN_PROGRESS... IN_PROGRESS...
# 18 minutes later: COMPLETED
Kling 3.0 Pro takes ~18 minutes per clip. The documentation says 2-5 minutes. The reality is 3-4x slower.
The fix: increase the initial sleep from 3 minutes to 15 minutes, set max collect attempts to 10 with 2-minute intervals. Total maximum wait: ~35 minutes. Not elegant, but reliable.
Incident #3: Same Bug, Different Step
Video clips finally generate. Five beautiful 5-second clips. Pipeline moves to "Rendering Video" — Remotion Lambda stitches the clips with narration into a final MP4.
Status stays at "Rendering Video" for 30 minutes. Then silence.
Same root cause. The pollRenderProgress() function was a polling loop inside a single step.run():
// This loop runs for up to 10 minutes — kills the Vercel function
for (let i = 0; i < 200; i++) {
const progress = await getRenderProgress(renderId, bucketName);
if (progress.done) return progress.outputFile;
await new Promise(r => setTimeout(r, 3000)); // 3s × 200 = 10 min
}
Same fix. Split into check-sleep-retry:
await step.sleep("wait-for-render", "5m");
for (let attempt = 0; attempt < 10; attempt++) {
const result = await step.run(`check-render-${attempt}`, async () => {
return await checkRenderProgress(renderId, bucketName);
});
if (result) { videoUrl = result; break; }
await step.sleep(`wait-render-${attempt}`, "1m");
}
The pattern is universal: never long-poll inside a serverless function. If the operation takes more than a minute, use an external orchestrator.
Incident #4: The Invisible Bundle
Three deploys later. Video clips generate. Rendering starts. Remotion Lambda times out after 15 minutes with no output.
This time the code was fine. The problem was deployment.
Remotion Lambda runs a pre-bundled React application on AWS. I'd added <OffthreadVideo> to the scene component to render video clips:
{scene.videoUrl ? (
<OffthreadVideo
src={scene.videoUrl}
style={{ width: "100%", height: "100%", objectFit: "cover" }}
/>
) : (
<Img src={scene.imageUrl} style={{...}} />
)}
But I'd only deployed the Vercel app. The Remotion Lambda bundle on S3 was still the old version — no OffthreadVideo, no video clip support. Lambda was trying to render scenes with undefined video components and silently failing.
The fix:
# Redeploy the Remotion bundle to S3
npx remotion lambda sites create \
--region us-east-1 \
--site-name repoclip \
remotion/src/Root.tsx
# Also upgrade Lambda resources for video decoding
npx remotion lambda functions deploy \
--memory 3008 \
--timeout 900 \
--disk 4096
Two deployment targets. Two separate deploy processes. I now have a checklist.
Incident #5: The BGM Credit Trap
Pipeline works end-to-end. Video clips render. I enable background music (ElevenLabs Music API) and run a test.
[bgm_generation] ElevenLabs Music API error (401): insufficient_credits
The BGM for a 35-second video consumed ~4,000 ElevenLabs credits — nearly all of my monthly quota on the Creator plan. One BGM generation cost more than 400 TTS narrations.
Two fixes:
1. Graceful fallback — BGM failure no longer kills the pipeline:
const bgm = bgmEnabled
? await step.run("generate-bgm", async () => {
try {
return await withRetry("bgm_generation", () =>
generateBGM(projectId, videoConfig, bgmDuration)
);
} catch (error) {
console.warn("BGM failed, continuing without:", error);
return null; // Video generates fine without BGM
}
})
: null;
2. Price adjustment — BGM addon went from 5 credits to 20 credits. The original 5-credit price was set before I knew the actual API cost. Always validate pricing against real usage data.
The Pipeline Today
User submits GitHub URL
↓
[Inngest] Fetch code → Gemini analysis → Generate video config
↓
[Inngest] Submit 5 video jobs to fal.ai queue (< 10 seconds)
↓
[Inngest] step.sleep("15m") — zero compute cost
↓
[Inngest] Collect completed clips (retry up to 10× at 2-min intervals)
↓
[Inngest] Generate TTS narrations in parallel (< 30 seconds)
↓
[Inngest] Trigger Remotion Lambda render
↓
[Inngest] step.sleep("5m") — zero compute cost
↓
[Inngest] Check render progress (retry up to 10× at 1-min intervals)
↓
Final MP4 with AI video clips, narration, and BGM
Total wall-clock time: ~30 minutes. Total Vercel compute: under 2 minutes.
What I Learned
1. Serverless timeouts are the #1 constraint for AI pipelines. Not cost. Not quality. Not rate limits. The hard timeout wall shapes your entire architecture. Design for it from day one, not after your fifth production incident.
2. The submit-sleep-collect pattern is essential. If you're calling any AI API that takes more than 60 seconds, you need an orchestrator that can sleep without consuming resources. Inngest, Temporal, AWS Step Functions — pick one.
3. "2-5 minutes" in AI API docs means "maybe 18 minutes." Always measure actual latency with your real workload before setting timeouts. Published benchmarks are best-case scenarios.
4. Two deployment targets means two deploy checklists. When your rendering engine runs on separate infrastructure (Lambda, GPU workers, etc.), code changes require deploying to both. Automate this or you'll forget.
5. Make expensive features fail gracefully. BGM, video clips, 4K rendering — anything with high API costs should degrade, not crash. Users would rather have a video without music than no video at all.
The Numbers After Launch
| Image Mode | Video Short | |
|---|---|---|
| Generation time | ~8 min | ~30 min |
| API cost per video | ~$0.63 | ~$5.73 |
| RepoClip price | 10 credits | 100 credits |
| Quality | Slideshow with Ken Burns | Cinematic AI clips |
| Production incidents | 0 | 5 |
Was it worth five incidents? Look at the output and judge for yourself: repoclip.io/gallery
The Stack
- Framework: Next.js 16 (App Router) + TypeScript
- Orchestration: Inngest (the hero of this story)
- Code Analysis: Gemini 2.5 Flash
- Image Generation: Nano Banana 2 via Fal.ai
- Video Generation: Kling 3.0 Pro via Fal.ai
- Narration: OpenAI TTS
- Background Music: ElevenLabs Music API
- Video Rendering: Remotion Lambda (AWS)
- Database/Auth/Storage: Supabase
- Deployment: Vercel
Try It
Want to see what Kling 3.0 Pro produces in a real pipeline? Try it on your own repo: repoclip.io
Select "Video Short" mode when creating a video. The free tier gives you 2 videos/month. Fair warning: it takes ~30 minutes, but the result is worth the wait.
Questions for the community:
- Have you hit serverless timeout walls with AI APIs? How did you solve it?
- What orchestration tool do you use for long-running AI pipelines?
Drop a comment or find me on GitHub.
Five production incidents, three architectural rewrites, and one pricing mistake. Sometimes the best way to learn a platform's limits is to exceed them — repeatedly.
Top comments (0)