Hey everyone! 👋 I spent the last 3 months building Mimoir AI, a platform that restores old photos with AI and generates people's life stories as narrated documentaries. Here's the journey, the wins, and the parts I'd do differently.
The Origin Story
I found a box of my grandparents' old photos in the attic. Most were too faded or damaged to see clearly. Looking at a picture of my grandmother from the 1960s — face completely washed out — I thought: "There's probably an AI model that could fix this now."
That one photo sparked the entire project.
The Stack (Why These Tools)
- Next.js 14 (App Router) — needed a fast way to ship, Vercel's serverless is perfect for early-stage MVPs
- Supabase — wanted Auth out-of-the-box, didn't want to manage PG myself
- Google Gemini 3.1 Flash Image — image-in-image models are still underrated; Gemini's understanding of semantics crushed it compared to chaining multiple specialized models
- ElevenLabs TTS — best naturalness-to-cost ratio I found
- FFmpeg on Vercel Serverless — this one... I'll explain. Deep breath.
- Cloudflare R2 — cheap object storage, way better DX than AWS S3
Phase 1: Photo Restoration (Week 1-2)
Started simple. Upload a photo → pass to Gemini → get back a restored image.
The hardest part was understanding Gemini's image API:
// This is what worked
const body = JSON.stringify({
contents: [{
parts: [
{ inline_data: { mime_type: "image/jpeg", data: imgB64 } },
{ text: "Restore this old photo: fix damage, enhance clarity, preserve original colors" },
],
}],
generationConfig: {
responseModalities: ["IMAGE", "TEXT"],
},
});
const res = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent?key=${apiKey}`,
{ method: "POST", headers: { "Content-Type": "application/json" }, body }
);
The results were... really good. Not perfect, but definitely good enough that my grandma cried when she saw her restored wedding photo again.
Phase 2: The Life Score Questionnaire (Week 3-4)
Built a 60-second quiz that scores across 5 dimensions:
- Life Experience (25%)
- Life Challenges (20%)
- Life Growth (20%)
- Life Impact (20%)
- Life Freedom (15%)
Used OpenAI GPT-4o-mini at first, then switched to Gemini 2.5 Flash for cost. The cost savings are substantial — like $0.003 per generation vs $0.008.
Phase 3: Video Generation — The Serverless Nightmare (Week 5-8)
"Let's generate a documentary video from photos and a script." Sounds simple.
It's not.
The FFmpeg-on-Vercel Saga
Problem 1: Binary Compatibility
Error: ffmpeg binary not compatible with platform
ffmpeg-static downloads at build time, but Vercel's caching is wonky. Switched to @ffmpeg-installer/ffmpeg but then...
Problem 2: Ancient FFmpeg Version
Every model I tried had an old ffmpeg version bundled (4.3 era). Missing filters:
-
xfadetransitions don't exist → had to useconcat - ASS subtitle rendering needs
libass→ switched to mov_text
Problem 3: Audio Padding
// This doesn't work on serverless ffmpeg
apad=whole_dur=30
// Error: Option 'whole_dur' not found
// Solution: use bare apad + output truncation
// ffmpeg ... -apad ... -t 30 output.mp4
Problem 4: Execution Time
A 3-minute 1080p video took 250+ seconds to encode. Vercel serverless maxes at 300 seconds (Pro plan). Cutting it close.
What I learned:
- Serverless is not great for video. Use it for the orchestration, not the heavy lifting.
- Testing locally with Docker didn't catch these issues (different ffmpeg version locally vs on Vercel).
- Next time, I'd use a dedicated worker (AWS Batch, Railway, Google Cloud Run) for encoding.
Current Solution (Keeps Costs Down)
- Photos limited to 1080p resolution
- Max 3-minute videos
- Sequential processing (not parallel) to stay under memory limits
- Pre-calculated frame counts to avoid surprises
Phase 4: The Comparison Feature (Week 9+)
Let two users compare their Life Scores side-by-side. This was straightforward:
function compareLifeMaps(a: LifeMap, b: LifeMap): ComparisonResult {
const dimensions = [
{ label: "Life Experience", valueA: a.experience, valueB: b.experience },
// ... 4 more
];
const avgDiff = dimensions.reduce((s, d) => s + Math.abs(d.diff), 0) / 5;
const similarity = Math.max(0, 100 - avgDiff);
return { dimensions, similarity, insights: generateInsights(...) };
}
Probably the most fun part to build — seeing two people's life profiles side-by-side is genuinely cool.
Lessons Learned
What Worked Well
- Gemini's image model — semantic understanding beats specialized model chains
- Free tier testing — Gemini gives 250 free API calls/day, perfect for development
- Supabase — Auth + database + real-time without touching DevOps
- Ship early — had users testing by week 2
What I'd Change
- Skip serverless video encoding. Just use a job queue. It's cheaper and less headache.
- Plan for state management earlier. Ended up rebuilding my state layer twice.
- Don't overthink the free tier. I spent a week optimizing for free users before having any paying users.
The Numbers (So Far)
- Time: 3 months part-time
- Cost per Life Map generation: ~$0.0003 (Gemini API + storage)
- Cost per photo restoration: ~$0.001
- Cost per documentary video: ~$0.02 (ElevenLabs + computing)
- Free tier users: Users get 3 free generations/month to try everything
What's Next
Building in public now. If it gains traction, the roadmap is:
- Family documentaries (generate a video from multiple people's stories)
- Print-on-demand photo books
- Integration with Instagram Stories (viral angle)
- Podcast-style audio narratives
Try It
If you've got old photos you want to see restored or want to generate your own life documentary, give it a shot: https://www.mimoir-ai.com (free tier, no credit card)
Would love to hear what people think. And if you've built something similar or run into the FFmpeg-on-serverless problem, drop a comment below 👇
Shameless plug: If you liked this, follow for more indie shipping updates. Building in public, one commit at a time.
Top comments (0)