Nikoloz Turazashvili (@axrisi)

Posted on Sep 13

This AI Tells the Story Behind Any Historical Photo or Video

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built the Historical Photo/Video Narrator, an interactive applet designed to bring the past to life. This tool allows users to upload historical photos and videos to generate rich, AI-powered narratives that uncover the stories hidden within the frames.

But it doesn't stop at storytelling. The applet also features a powerful "Re-imagine" function. After learning about the context of an image (or capturing a specific frame from a video), users can edit the photo using simple text prompts. Want to see what that 1920s street scene would look like on a sunny day? Or add a splash of color to a black-and-white portrait? The Historical Narrator makes it possible, creating a unique bridge between historical appreciation and creative expression.

The core experience is about transforming passive consumption of historical media into an active, engaging, and educational journey, with all creations saved locally in the browser for future viewing.

Demo

historical-photo-video-narrator-147726047063.us-west1.run.app

Full Video Demo

To showcase the full video processing and frame-capture capabilities, here is a short video of the project in action:

Here’s a walkthrough of the experience:

1. Upload Your Media: The app starts with a clean, simple interface for uploading an image or a video file.

2. Generate the Narrative: Once a photo is uploaded, Gemini analyzes the visual content and generates a compelling historical narrative. Users can even listen to the story using the built-in text-to-speech feature.

3. Capture & Re-imagine: For videos, you can pause and capture a specific frame. For any image or captured frame, you can then enter a text prompt to modify it.

4. View the Result: The app presents the original and the newly generated image side-by-side, instantly showing the power of your creative direction combined with AI.

Source Code
Link to Google AI Studio

How I Used Google AI Studio

Google AI Studio was the backbone of this project, allowing me to rapidly prototype and deploy a sophisticated multimodal application. I leveraged two key Gemini models:

gemini-2.5-flash: I chose this model for the core narrative generation due to its incredible speed and powerful multimodal understanding. By providing it with an image or video file and a carefully crafted system prompt ("You are a historian and captivating storyteller..."), I could reliably generate high-quality, context-aware narratives that truly enhance the source media.
gemini-2.5-flash-image-preview: This model is the engine behind the "Re-imagine" feature. Its image editing capabilities are phenomenal. The API was straightforward to implement; I passed the source image and the user's text prompt to the model, configuring the response to ensure it returned an edited image. This allowed for an intuitive and powerful creative tool within the app.

The entire development and deployment process was streamlined through Google AI Studio, making it possible to go from concept to a fully functional, deployed applet efficiently.

Multimodal Features

The applet is built around two core multimodal functionalities that work in tandem to create a cohesive user experience.

Multimodal Understanding (Media-to-Text): The primary feature is the app's ability to interpret visual media (images/videos) and translate that understanding into descriptive text. This is more than just object detection; it's about context, atmosphere, and historical inference.
- Why it enhances the user experience: It adds a profound layer of depth and discovery. A static, silent photo is transformed into a gateway to a potential story, making history feel immediate and accessible. It turns a simple gallery viewer into an educational and storytelling tool.
Multimodal Generation (Image + Text-to-Image): The "Re-imagine" feature allows for creative input on top of the historical analysis. It takes two distinct modalities—an existing image and a new text prompt from the user—and merges them to generate a completely new visual artifact.
- Why it enhances the user experience: This fosters a deeper, more personal connection with the media. After learning the story behind a photo, the user is invited to become part of the creative process. This interactive loop of "learn, then create" is incredibly engaging and provides a unique way to explore history and "what if" scenarios visually.

Top comments (6)

Prema Ananda • Sep 13

Hey there!
Long time no see!
Really nice clean and straightforward project!
Meanwhile, I've overcomplicated mine so much that now I'm not even sure if I'll manage to finish it... 😅

Nikoloz Turazashvili (@axrisi) • Sep 13

hey!
haha, yes, I was busy with raising capital for my startup :)
now found some time on weekends to do something I enjoy.

thanks for kind words. looking forward to see your submission. <3

Pravesh Sudha • Sep 14

Love to See the regular challenge folks here, I also made something similar, What a coincidence!

TROJAN • Sep 14

Absolutely love the Historical Photo/Video Narrator! Turning static historical media into interactive, narrated stories is such a clever idea, and the 'Re-imagine' feature takes it to the next level. The combination of narrative generation and user-guided image editing makes history feel alive and personal—like you’re stepping into the past and shaping it creatively. Really impressive use of Gemini models to blend storytelling and multimodal AI!

Josh Adair • Sep 18 • Edited

Hey, echt spannendes Thema hier, danke fürs Teilen! Ich finde es faszinierend, wie KI uns helfen kann, Geschichten hinter alten Fotos oder Videos sichtbar zu machen. Oft schaut man sich ein Bild an und fragt sich, was damals wirklich passiert ist – genau da kann so ein Tool mega nützlich sein. Gleichzeitig denke ich, dass man nicht vergessen sollte, auch mit klassischen Kameraeinstellungen und Fototechniken zu arbeiten, um selbst bewusst Geschichten einzufangen. Ich hab da viel gelernt über luminarneo.de/blog/kamera-einstell..., echt hilfreich im Alltag.

Cyber Safety Zone • Sep 15

Really cool work, @axrisi! Turning static historical photos/videos into living stories is such a compelling idea. I especially love the “Re-imagine” feature — seeing what might’ve been (colorizing, “what if” edits) adds another dimension.

A few thoughts/questions:

How do you handle accuracy of the narrative generation? For example, ensuring the context, setting, or people are represented correctly (not just what the AI guesses).
Is there a plan (or already built) feature for user-feedback or corrections, so users can refine or correct the story if something seems off?

Overall, this is such a creative blend of education, history, and AI. Looking forward to seeing how this evolves!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.