DEV Community

Cover image for Ancient Feelings using nano banana
AI Bug Slayer ๐Ÿž
AI Bug Slayer ๐Ÿž

Posted on

Ancient Feelings using nano banana

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built Ancient Echoes, a web application that acts as a personal time machine for your imagination. It's an AI-powered portal that allows you to generate and edit stunning, evocative images of ancient worlds.

The core idea was to create more than just an image generator; I wanted to build an experience. It's for the history enthusiast who wants to see the Library of Alexandria in its prime, the writer needing inspiration for a scene in ancient Rome, or anyone who simply wants to play with the aesthetics of the past. Ancient Echoes solves the "blank canvas" problem by instantly providing a collection of visual "artifacts" based on a simple text description. From there, you can act as a digital archaeologist, refining and altering these images until they perfectly match the vision in your mind.

Key features include:

  • Batch Image Generation: Summon multiple unique, ancient-themed images from a single prompt.
  • Intuitive Multimodal Editing: Select any image and use simple text commands to modify itโ€”add objects, change the weather, alter materials, and more.
  • Persistent Gallery: All your creations are automatically saved to your browser's local storage, creating a personal gallery of your journeys into the past.
  • Thematic UI: The entire interface, from the parchment-like background to the elegant typography, is designed to immerse you in the ancient world.

Demo

You can try out the live version of the applet here:
Link to Deployed Applet

Hereโ€™s a walkthrough of how it works:

1. Generating Your First Vision:
Simply describe a scene in the prompt box. The app takes your words and generates a set of unique, sepia-toned images.
(A screenshot of the UI with the prompt "A bustling marketplace in ancient Rome" and the gallery filled with four generated images would go here.)

2. Editing a Masterpiece:
Click on any image to open the editor. Here, you can provide a new prompt to alter the image. For example, after generating a Roman marketplace, you could ask it to "add a golden chariot in the foreground."
(A video or GIF showing the modal opening, the user typing the edit prompt, and the image transforming to include the chariot would be perfect here.)

3. Your Personal Gallery:
All your generated and edited images are saved, ready for you to revisit anytime.
(A screenshot showing a gallery filled with a variety of generated and edited ancient images.)

How I Used Google AI Studio

This applet is powered entirely by the multimodal capabilities of the Gemini API, which I integrated using the @google/genai library. Google AI Studio was instrumental in prototyping and testing my prompts to achieve the desired "ancient photo" aesthetic.

I leveraged two key models:

  1. imagen-4.0-generate-001: This powerful model is the engine for the initial image generation. By crafting a detailed base prompt (An ancient photo, vintage style, sepia tone...) and appending the user's input, I was able to consistently generate high-quality images that fit the app's theme. The ability to request multiple images at once (numberOfImages: 4) is key to the app's core experience of providing a variety of creative starting points.

  2. gemini-2.5-flash-image-preview: This is where the multimodal magic truly happens. For the editing feature, I send a generateContent request containing both the existing image (as a base64 string) and the user's new text prompt. This model's ability to understand the context of both the image and the text allows for incredible, iterative creative control.

Multimodal Features

Ancient Echoes is fundamentally a multimodal application, using the synergy between text and images to create an intuitive and powerful user experience.

The primary multimodal feature is the Image + Text Editing Capability. When a user wants to edit an image, the application doesn't just generate a new one from a text prompt. Instead, it sends two distinct modes of data to the Gemini API:

  • Modality 1: Image Data (The existing picture the user wants to change).
  • Modality 2: Text Data (The user's instructions, e.g., "make the sky stormy").

The gemini-2.5-flash-image-preview model processes this combined input to produce a new image that is a direct modification of the original. This enhances the user experience profoundly. It turns the creative process into a conversation. You're not just giving commands; you're collaborating with the AI on an existing piece of art. This iterative workflow is far more natural and engaging than having to start from scratch with a perfectly detailed prompt every time. It allows for discovery, refinement, and a genuine sense of co-creation.

Top comments (0)