DEV Community

kafraid
kafraid

Posted on

How I Built an AI Photo Restoration Tool with Next.js & Gemini in 3 Months

Hey everyone! 👋 I spent the last 3 months building Mimoir AI, a platform that restores old photos with AI and generates people's life stories as narrated documentaries. Here's the journey, the wins, and the parts I'd do differently.

The Origin Story

I found a box of my grandparents' old photos in the attic. Most were too faded or damaged to see clearly. Looking at a picture of my grandmother from the 1960s — face completely washed out — I thought: "There's probably an AI model that could fix this now."

That one photo sparked the entire project.

The Stack (Why These Tools)

  • Next.js 14 (App Router) — needed a fast way to ship, Vercel's serverless is perfect for early-stage MVPs
  • Supabase — wanted Auth out-of-the-box, didn't want to manage PG myself
  • Google Gemini 3.1 Flash Image — image-in-image models are still underrated; Gemini's understanding of semantics crushed it compared to chaining multiple specialized models
  • ElevenLabs TTS — best naturalness-to-cost ratio I found
  • FFmpeg on Vercel Serverless — this one... I'll explain. Deep breath.
  • Cloudflare R2 — cheap object storage, way better DX than AWS S3

Phase 1: Photo Restoration (Week 1-2)

Started simple. Upload a photo → pass to Gemini → get back a restored image.

The hardest part was understanding Gemini's image API:

// This is what worked
const body = JSON.stringify({
  contents: [{
    parts: [
      { inline_data: { mime_type: "image/jpeg", data: imgB64 } },
      { text: "Restore this old photo: fix damage, enhance clarity, preserve original colors" },
    ],
  }],
  generationConfig: {
    responseModalities: ["IMAGE", "TEXT"],
  },
});

const res = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent?key=${apiKey}`,
  { method: "POST", headers: { "Content-Type": "application/json" }, body }
);
Enter fullscreen mode Exit fullscreen mode

The results were... really good. Not perfect, but definitely good enough that my grandma cried when she saw her restored wedding photo again.

Phase 2: The Life Score Questionnaire (Week 3-4)

Built a 60-second quiz that scores across 5 dimensions:

  • Life Experience (25%)
  • Life Challenges (20%)
  • Life Growth (20%)
  • Life Impact (20%)
  • Life Freedom (15%)

Used OpenAI GPT-4o-mini at first, then switched to Gemini 2.5 Flash for cost. The cost savings are substantial — like $0.003 per generation vs $0.008.

Phase 3: Video Generation — The Serverless Nightmare (Week 5-8)

"Let's generate a documentary video from photos and a script." Sounds simple.

It's not.

The FFmpeg-on-Vercel Saga

Problem 1: Binary Compatibility

Error: ffmpeg binary not compatible with platform
Enter fullscreen mode Exit fullscreen mode

ffmpeg-static downloads at build time, but Vercel's caching is wonky. Switched to @ffmpeg-installer/ffmpeg but then...

Problem 2: Ancient FFmpeg Version
Every model I tried had an old ffmpeg version bundled (4.3 era). Missing filters:

  • xfade transitions don't exist → had to use concat
  • ASS subtitle rendering needs libass → switched to mov_text

Problem 3: Audio Padding

// This doesn't work on serverless ffmpeg
apad=whole_dur=30
// Error: Option 'whole_dur' not found

// Solution: use bare apad + output truncation
// ffmpeg ... -apad ... -t 30 output.mp4
Enter fullscreen mode Exit fullscreen mode

Problem 4: Execution Time
A 3-minute 1080p video took 250+ seconds to encode. Vercel serverless maxes at 300 seconds (Pro plan). Cutting it close.

What I learned:

  • Serverless is not great for video. Use it for the orchestration, not the heavy lifting.
  • Testing locally with Docker didn't catch these issues (different ffmpeg version locally vs on Vercel).
  • Next time, I'd use a dedicated worker (AWS Batch, Railway, Google Cloud Run) for encoding.

Current Solution (Keeps Costs Down)

  • Photos limited to 1080p resolution
  • Max 3-minute videos
  • Sequential processing (not parallel) to stay under memory limits
  • Pre-calculated frame counts to avoid surprises

Phase 4: The Comparison Feature (Week 9+)

Let two users compare their Life Scores side-by-side. This was straightforward:

function compareLifeMaps(a: LifeMap, b: LifeMap): ComparisonResult {
  const dimensions = [
    { label: "Life Experience", valueA: a.experience, valueB: b.experience },
    // ... 4 more
  ];

  const avgDiff = dimensions.reduce((s, d) => s + Math.abs(d.diff), 0) / 5;
  const similarity = Math.max(0, 100 - avgDiff);

  return { dimensions, similarity, insights: generateInsights(...) };
}
Enter fullscreen mode Exit fullscreen mode

Probably the most fun part to build — seeing two people's life profiles side-by-side is genuinely cool.

Lessons Learned

What Worked Well

  1. Gemini's image model — semantic understanding beats specialized model chains
  2. Free tier testing — Gemini gives 250 free API calls/day, perfect for development
  3. Supabase — Auth + database + real-time without touching DevOps
  4. Ship early — had users testing by week 2

What I'd Change

  1. Skip serverless video encoding. Just use a job queue. It's cheaper and less headache.
  2. Plan for state management earlier. Ended up rebuilding my state layer twice.
  3. Don't overthink the free tier. I spent a week optimizing for free users before having any paying users.

The Numbers (So Far)

  • Time: 3 months part-time
  • Cost per Life Map generation: ~$0.0003 (Gemini API + storage)
  • Cost per photo restoration: ~$0.001
  • Cost per documentary video: ~$0.02 (ElevenLabs + computing)
  • Free tier users: Users get 3 free generations/month to try everything

What's Next

Building in public now. If it gains traction, the roadmap is:

  • Family documentaries (generate a video from multiple people's stories)
  • Print-on-demand photo books
  • Integration with Instagram Stories (viral angle)
  • Podcast-style audio narratives

Try It

If you've got old photos you want to see restored or want to generate your own life documentary, give it a shot: https://www.mimoir-ai.com (free tier, no credit card)

Would love to hear what people think. And if you've built something similar or run into the FFmpeg-on-serverless problem, drop a comment below 👇


Shameless plug: If you liked this, follow for more indie shipping updates. Building in public, one commit at a time.

Top comments (0)