This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built the Progressive Story Maker, an interactive web application that transforms storytelling into a collaborative, choice-driven experience with an AI.
The app solves the problem of "writer's block" and creative inertia by turning narrative creation into an engaging game. It begins by generating the first sentence of a story in a user-selected genre (Medieval Fantasy, Modern Mystery, or Kiddish Adventure). Within this sentence, key words are highlighted. When the user clicks a word, it becomes the creative prompt for the Gemini API, which then generates the next paragraph of the story. This new paragraph has its own clickable keywords, allowing the user to continuously guide the narrative down unique, branching paths. The result is an endless story machine that empowers users to co-create completely original tales simply by making a series of simple, intuitive choices.
Demo
here is the link: https://progressive-story-maker-598974168521.us-west1.run.app
How I Used Google AI Studio
While the final application interacts directly with the Gemini API via its SDK, Google AI Studio was an indispensable tool during the development and prototyping phases.
Prompt Engineering & Refinement: I used the AI Studio playground extensively to design and test the prompts that power the application. It provided a rapid feedback loop for crafting instructions that could reliably generate compelling story segments and, most importantly, extract exact, verbatim keywords from the generated text. This was crucial for ensuring the frontend could always find and highlight the interactive words.
Structured Output (JSON Mode): AI Studio was instrumental in defining and validating the responseSchema for the Gemini API calls. By experimenting in the studio, I finalized a robust JSON structure ({ "paragraph": "...", "keywords": [...] }). This use of structured output makes the application incredibly resilient by guaranteeing that the data received from the API is always predictable and correctly formatted, eliminating the need for fragile string parsing and significantly reducing potential runtime errors.
Model Selection: I used AI Studio to evaluate different models, ultimately selecting gemini-2.5-flash for its optimal balance of speed, creativity, and cost-effectiveness, which is essential for a real-time, interactive user experience like this one.
Multimodal Features
The current version of the Progressive Story Maker focuses on perfecting a masterful unimodal (text-to-text) experience to ensure the core narrative mechanic is seamless and engaging.
However, the application was architected as a strong foundation for future multimodal expansion.
Image-Driven Prompts (Image-to-Text): To introduce multimodal input, users can upload an image instead of clicking a keyword. The contents of the prompt sent to Gemini would then include both the image and a text instruction like, "Continue the story based on this image." This allows users to introduce completely new visual concepts into the narrative, giving them an even more powerful way to guide the AI's creativity and making the experience truly multimodal.
Top comments (0)