Nikoloz Turazashvili (@axrisi)

Posted on Feb 28 • Edited on Mar 3

I Built a Gemini-Powered History Narrator 6 Months Ago. Here's What I'd Tell You Now.

#devchallenge #geminireflections #gemini

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

What if every faded photograph in your grandmother's attic could tell its own story?

That question led me to build the Historical Photo/Video Narrator — an app where you upload a historical photo or video, and Google Gemini narrates the history behind it. Not a caption. Not a label. A full narrative — the era, the context, what was likely happening just outside the frame. Then, if you want to go further, a "Re-imagine" feature lets you edit the photo with a text prompt. Want to see that 1920s street scene on a sunny day? Add color to a black-and-white portrait? Type it and Gemini generates it.

I built this in September 2025 for the Google AI Studio Multimodal Challenge. It placed well. I deployed it to Cloud Run and moved on to the next project.

That was six months ago. The app is still running.

Most challenge submissions are about something built last weekend. This one is about what happens after the hackathon high wears off — what breaks, what surprises you, and what you learn about a model by living with it for half a year.

Two Gemini models power the two core features:

gemini-2.5-flash handles narrative generation. It takes an image or video, analyzes the visual content, and produces a historically-grounded narrative. It doesn't just describe what it sees — it infers era, cultural context, and significance. A built-in text-to-speech feature lets you listen to the story instead of reading it.
gemini-2.5-flash-image (more on that name in a moment) powers the Re-imagine feature. Pass it a source image and a text prompt, and it generates an edited version. This is the part that makes users linger — the loop of "learn the story, then reimagine the scene" turns passive viewing into active exploration.

The UX flow is intentionally simple: Upload → Narrate → Listen → Re-imagine → Save locally. Everything stays in the browser. No accounts, no sign-ups, no data collection.

Demo

Try it yourself — upload any historical photo and let Gemini tell you what it sees.

Source: AI Studio Project

What I Learned

Six months of running an AI-powered app taught me things that the first weekend of building it never could. Here are the lessons that stuck.

1. Your app will break without you touching a single line of code

One day the Re-imagine feature stopped working. No deployment. No code change. Nothing on my end.

The model name had changed upstream. What was gemini-2.5-flash-image-preview became gemini-2.5-flash-image. My API calls were hitting a name that no longer existed.

In traditional software, you pin your dependency versions. If Express 4.18.2 works, you lock it and it works forever. With Gemini model endpoints, there's no equivalent — you're always referencing a name that the provider can change. When they rename it, your production app breaks silently.

I fixed it in minutes once I found the cause. But the lesson is deeper: in AI-powered apps, your dependencies aren't just packages in a lockfile. They're model endpoints that can change names, behavior, or capabilities without warning. If you're building anything meant to last longer than a hackathon, you need monitoring on your AI calls, not just your infrastructure.

2. Where your project lives matters more than you think

When I first built this, AI Studio saved projects to Google Drive. My project was a file in my Drive folder. That worked fine — for me.

Then AI Studio changed. Projects are now saved within AI Studio itself, not Drive. This sounds like a minor infrastructure detail, but it broke something important: sharing.

During the original challenge, many builders had the same problem. Their project was saved to a private Drive folder. They'd share the AI Studio link, but viewers couldn't access it because the underlying Drive file was private. The AI Studio link worked, but the project behind it didn't.

The lesson: platform infrastructure decisions you didn't make can affect your users. When you build on a platform, you inherit not just its features but also its migrations. Keeping an app alive for months means riding through those changes.

3. Gemini thinks faster than it plans

When I first started using Gemini as a development tool — not just as the engine inside my app, but as a coding assistant while building it — I noticed a pattern. It would jump straight into writing code before fully understanding the problem. Ask it to implement a feature and it would produce something immediately, confidently, and sometimes wrong in ways that took longer to debug than writing it from scratch.

Over the past six months, this has noticeably improved. Gemini now takes a beat before responding — thinking through the approach, considering edge cases, then writing the code. It plans before it executes. The difference is real. Earlier interactions felt like pair-programming with someone who typed faster than they thought. Now it feels like working with someone who reads the whole ticket before opening their editor.

The impulse to start coding before understanding the problem is the most expensive habit in software. Watching an AI model learn to break that habit was a strange kind of validation.

4. Prompt engineering is maintenance, not a one-time task

My system prompt has stayed the same since September: "You are a historian and captivating storyteller..." — the same instruction, word for word.

But the model's interpretation of that prompt has subtly shifted as the underlying model improved. The narrations in February 2026 are richer and more nuanced than the narrations from September 2025, even though my prompt hasn't changed. The model is doing more with the same instruction.

This is actually a compliment to Gemini's backward compatibility — my prompt didn't break. But it also means the output of your app changes over time even when your code doesn't. For a history narrator, that's a good thing — better narrations for free. For an app where output consistency matters (medical, legal, financial), this invisible drift could be a problem.

The takeaway: if you're building with AI for production, you need to think about prompt behavior the way you think about database migrations — something that requires monitoring and occasional adjustment, not a set-and-forget configuration.

5. Deploying to Cloud Run was the easy part. Leaving it running was the lesson.

The app has been live on Cloud Run for six months with zero maintenance. No patches, no restarts, no scaling interventions. Cloud Run's scale-to-zero means the app costs essentially nothing when no one is using it, and spins up in seconds when someone visits.

I've shipped side projects on custom VPS setups, on Heroku, on various platforms. Most of them are dead now — killed by expired SSL certificates, crashed processes nobody restarted, or hosting bills I forgot to pay.

This one survived because I chose boring infrastructure. Cloud Run isn't exciting. It doesn't require Kubernetes knowledge or a DevOps pipeline. But it's the reason this app is still answering requests six months later instead of returning a 502 page.

The real lesson: for side projects, pick the deployment that requires the least ongoing attention. The best infrastructure is the kind you forget about.

Google Gemini Feedback

They asked for candor. Here it is.

The Good

Multimodal understanding is genuinely impressive. This isn't image labeling — Gemini infers historical context, era, cultural significance, and narrative arc from a single photograph. I've uploaded obscure family photos from the 1940s Caucasus region and gotten narrations that correctly identified the approximate era, clothing styles, and regional context. That's not pattern matching. That's understanding.

Flash model speed makes AI feel invisible. The narration generates fast enough that users don't feel like they're "waiting for AI." They click, and the story appears. That speed is the difference between a tool people use once and a tool people explore.

AI Studio as a prototyping environment is unmatched for getting to a deployed app quickly. I went from concept to deployed Cloud Run app in a single focused session. The integration between the IDE, the model playground, and the deployment pipeline is seamless.

Image editing quality is creative, not just mechanical. The Re-imagine feature doesn't just apply filters. It generates genuinely new interpretations of scenes. Users asked to "add color to this black-and-white photo" don't get colorization — they get a reimagined scene that feels alive.

The Bad

Model name changes in production are a real problem. gemini-2.5-flash-image-preview becoming gemini-2.5-flash-image broke my app with no warning, no deprecation period, and no migration guide. For hackathon projects, this is a shrug. For anything meant to stay running, it's a trust issue. I'd love to see semantic versioning or at least alias support for model endpoints — let the old name forward to the new one for 90 days.

AI Studio project storage migration caused real confusion. The shift from Google Drive storage to AI Studio-native storage happened without a clear heads-up to builders who had deployed apps. Sharing flows broke. The fix was simple once understood, but discovering it required digging.

The Honest

Running this app for six months has given me a perspective most builders don't get. The short version: Gemini is a strong foundation for building real things. The multimodal capabilities are not a gimmick. The speed is production-ready. The deployment story via Cloud Run is the best I've seen in the AI space.

Looking Forward

Six months with Gemini has made me genuinely excited about where these models are headed. The multimodal understanding is already strong — I'm looking forward to seeing what comes next with longer context windows, better video processing, and the continued improvements to image generation. If the tooling around production stability catches up (version pinning, deprecation notices, changelogs), Gemini becomes a no-brainer foundation for any AI-powered project. I'll be building with it again.

Every photo has a story. Most of them die with the people who were there. Tools like this won't bring those people back, but they can make sure the stories don't disappear with the photographs.

Try it: Historical Photo/Video Narrator

Top comments (5)

shaniya alam • Mar 6

That sounds like a really interesting project. Building a Gemini-powered history narrator must have been a great way to experiment with how AI can turn historical information into engaging storytelling. After six months, I’m curious about what insights you gained regarding accuracy, context handling, and narrative consistency. Projects like this show how AI can make learning history more interactive and accessible. It would be great to hear what challenges you faced during development and whether user feedback changed the way you improved the narrator over time.