kafraid

Posted on Mar 18

How I Built an AI Photo Restoration Tool with Next.js & Gemini in 3 Months

#ai #nextjs #product

Hey everyone! 👋 I spent the last 3 months building Mimoir AI, a platform that restores old photos with AI and generates people's life stories as narrated documentaries. Here's the journey, the wins, and the parts I'd do differently.

The Origin Story

I found a box of my grandparents' old photos in the attic. Most were too faded or damaged to see clearly. Looking at a picture of my grandmother from the 1960s — face completely washed out — I thought: "There's probably an AI model that could fix this now."

That one photo sparked the entire project.

The Stack (Why These Tools)

Next.js 14 (App Router) — needed a fast way to ship, Vercel's serverless is perfect for early-stage MVPs
Supabase — wanted Auth out-of-the-box, didn't want to manage PG myself
Google Gemini 3.1 Flash Image — image-in-image models are still underrated; Gemini's understanding of semantics crushed it compared to chaining multiple specialized models
ElevenLabs TTS — best naturalness-to-cost ratio I found
FFmpeg on Vercel Serverless — this one... I'll explain. Deep breath.
Cloudflare R2 — cheap object storage, way better DX than AWS S3

Phase 1: Photo Restoration (Week 1-2)

Started simple. Upload a photo → pass to Gemini → get back a restored image.

The hardest part was understanding Gemini's image API:

// This is what worked
const body = JSON.stringify({
  contents: [{
    parts: [
      { inline_data: { mime_type: "image/jpeg", data: imgB64 } },
      { text: "Restore this old photo: fix damage, enhance clarity, preserve original colors" },
    ],
  }],
  generationConfig: {
    responseModalities: ["IMAGE", "TEXT"],
  },
});

const res = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent?key=${apiKey}`,
  { method: "POST", headers: { "Content-Type": "application/json" }, body }
);

The results were... really good. Not perfect, but definitely good enough that my grandma cried when she saw her restored wedding photo again.

Phase 2: The Life Score Questionnaire (Week 3-4)

Built a 60-second quiz that scores across 5 dimensions:

Life Experience (25%)
Life Challenges (20%)
Life Growth (20%)
Life Impact (20%)
Life Freedom (15%)

Used OpenAI GPT-4o-mini at first, then switched to Gemini 2.5 Flash for cost. The cost savings are substantial — like $0.003 per generation vs $0.008.

Phase 3: Video Generation — The Serverless Nightmare (Week 5-8)

"Let's generate a documentary video from photos and a script." Sounds simple.

It's not.

The FFmpeg-on-Vercel Saga

Problem 1: Binary Compatibility

Error: ffmpeg binary not compatible with platform

ffmpeg-static downloads at build time, but Vercel's caching is wonky. Switched to @ffmpeg-installer/ffmpeg but then...

Problem 2: Ancient FFmpeg Version
Every model I tried had an old ffmpeg version bundled (4.3 era). Missing filters:

xfade transitions don't exist → had to use concat
ASS subtitle rendering needs libass → switched to mov_text

Problem 3: Audio Padding

// This doesn't work on serverless ffmpeg
apad=whole_dur=30
// Error: Option 'whole_dur' not found

// Solution: use bare apad + output truncation
// ffmpeg ... -apad ... -t 30 output.mp4

Problem 4: Execution Time
A 3-minute 1080p video took 250+ seconds to encode. Vercel serverless maxes at 300 seconds (Pro plan). Cutting it close.

What I learned:

Serverless is not great for video. Use it for the orchestration, not the heavy lifting.
Testing locally with Docker didn't catch these issues (different ffmpeg version locally vs on Vercel).
Next time, I'd use a dedicated worker (AWS Batch, Railway, Google Cloud Run) for encoding.

Current Solution (Keeps Costs Down)

Photos limited to 1080p resolution
Max 3-minute videos
Sequential processing (not parallel) to stay under memory limits
Pre-calculated frame counts to avoid surprises

Phase 4: The Comparison Feature (Week 9+)

Let two users compare their Life Scores side-by-side. This was straightforward:

function compareLifeMaps(a: LifeMap, b: LifeMap): ComparisonResult {
  const dimensions = [
    { label: "Life Experience", valueA: a.experience, valueB: b.experience },
    // ... 4 more
  ];

  const avgDiff = dimensions.reduce((s, d) => s + Math.abs(d.diff), 0) / 5;
  const similarity = Math.max(0, 100 - avgDiff);

  return { dimensions, similarity, insights: generateInsights(...) };
}

Probably the most fun part to build — seeing two people's life profiles side-by-side is genuinely cool.

Lessons Learned

What Worked Well

Gemini's image model — semantic understanding beats specialized model chains
Free tier testing — Gemini gives 250 free API calls/day, perfect for development
Supabase — Auth + database + real-time without touching DevOps
Ship early — had users testing by week 2

What I'd Change

Skip serverless video encoding. Just use a job queue. It's cheaper and less headache.
Plan for state management earlier. Ended up rebuilding my state layer twice.
Don't overthink the free tier. I spent a week optimizing for free users before having any paying users.

The Numbers (So Far)

Time: 3 months part-time
Cost per Life Map generation: ~$0.0003 (Gemini API + storage)
Cost per photo restoration: ~$0.001
Cost per documentary video: ~$0.02 (ElevenLabs + computing)
Free tier users: Users get 3 free generations/month to try everything

What's Next

Building in public now. If it gains traction, the roadmap is:

Family documentaries (generate a video from multiple people's stories)
Print-on-demand photo books
Integration with Instagram Stories (viral angle)
Podcast-style audio narratives

Try It

If you've got old photos you want to see restored or want to generate your own life documentary, give it a shot: https://www.mimoir-ai.com (free tier, no credit card)

Would love to hear what people think. And if you've built something similar or run into the FFmpeg-on-serverless problem, drop a comment below 👇

Shameless plug: If you liked this, follow for more indie shipping updates. Building in public, one commit at a time.

DEV Community