This is a submission for the Google AI Studio Multimodal Challenge
What I Built
MyStoryTime Comics is an immersive web app designed to make reading fun and personal for young children. It solves the challenge of engaging kids in reading by transforming them from passive consumers to active co-creators of their own stories.
The app features two distinct portals:
A Creator Dashboard: This is the creative hub where a parent or guardian can provide a simple story idea (e.g., "A story about my son's favorite teddy bear exploring the jungle"), a list of "difficult words" for their child to learn and even upload a photo of the actual teddy bear. The app then uses AI to generate a complete, illustrated comic book from these inputs.
Comics Dashboard: This is a vibrant, kid-friendly library displaying all the generated comics. Children can browse their personalized collection and dive into a rich reading experience with features like AI-powered read-aloud and interactive word definitions.
The goal is to enhance reading skills and vocabulary by creating stories that are not just engaging, but deeply personal and meaningful to the child.
Demo
- Live App: https://mystorytime-comics-422630072437.us-west1.run.app/
- Video Demo: https://www.loom.com/share/55744070cb4e4b1b9f5dac72f82683ee?sid=ef3f2856-b44e-4503-84c4-fa17e1c4ff97
How I Used Google AI Studio
I leveraged the Gemini API, accessible through the Google AI platform, as the core engine for MyStoryTime Comics. The entire creative pipeline, from character design to final illustration, is powered by Gemini models via the @google/genai
SDK.
-
gemini-2.5-flash
for Scripting & Language: I used this model for all text-based generation. To ensure a predictable story structure, I prompted it to return a structured JSON object containing the comic's title and an array of panels, each with a sceneDescription for the image model and dialogue for the reader. This model also powers the kid-friendly dictionary, providing simple definitions for challenging words on demand. -
gemini-2.5-flash-image-preview
for Illustration: This powerful text-to-image model was used to bring the stories to life visually. It generated both the vibrant, eye-catching comic covers and the detailed 2x2 multi-panel pages, interpreting the sceneDescription for each panel to create cohesive and narratively consistent artwork.
Multimodal Features
The app is built around a core multimodal workflow that seamlessly blends user inputs (text and images) with AI-generated content (text, structured JSON and images).
Image-to-Text Character Design: This is the app's cornerstone multimodal feature. A parent can upload a photo of their child's favorite toy or pet. The app sends this image, along with the text-based story idea, to the Gemini API. The model's task is not to edit the image, but to generate a detailed textual description of the character based on the photo (e.g., "A cheerful blue teddy bear with soft, worn fur and a small red bow tie"). This generated description becomes the "character sheet" that ensures the main character looks consistent across every panel of the comic.
Text-to-Image Storytelling: The entire visual narrative is a text-to-image process. The AI-generated script, which includes detailed scene descriptions, is fed page-by-page into the gemini-2.5-flash-image-preview model. This turns the structured text into a fully illustrated story, from the cover to the final page.
Interactive Reading Experience: While reading, a child can tap on a highlighted "difficult word." This triggers a text-to-text API call to get a simple definition, which is then spoken aloud using the browser's text-to-speech capabilities, creating an interactive and educational loop.
This chained, multimodal approach creates a uniquely personal experience, transforming a real-world object (a toy or a pet) into the hero of a digital, AI-illustrated adventure, making reading magical.
Top comments (0)