Comic Book Movie Creator

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

Comic Book Movie Creator 🎬

An innovative web application that empowers anyone, especially children and creatives, to bring their stories to life as a personalized, multimodal "motion comic."

What I Built

The Comic Book Movie Creator solves two major challenges in AI-powered creation: the overwhelming "blank canvas" problem and the difficulty of maintaining visual consistency. It replaces creative friction with a fun, guided 6-step journey.

Users start with a simple spark of an idea—provided via text, voice, or even a drawing—and the app works with them to develop a consistent character, outline a story, generate a full 16-page comic book, and finally, animate key scenes into a finished video, complete with AI-generated narration.

It's an end-to-end "idea-to-premiere" pipeline that showcases the power of chaining multiple Google AI modalities into one seamless, creative experience.

Demo

Live App URL: https://comic-book-movie-creator-421841157537.us-west1.run.app

GitHub Repository: https://github.com/lalomorales22/comic-book-movie-creator

Check out the full video demonstration of the app in action below!

Youtube Link: https://youtu.be/6e4NRPcsCz0

How I Used Google AI Studio

Google AI Studio was the command center for this project's development and is the ideal environment for running it.

Prompt Engineering and Prototyping

I used the AI Studio's playground extensively to design and test the complex chain of prompts required for the 6-step journey. Each step's prompt was carefully crafted to take the output of the previous step as its input, ensuring a cohesive flow. For example, the approved character description from Step 2 becomes a critical part of the prompt for generating the 16 comic panels in Step 4.

Multimodal Model Integration

This project leverages a suite of powerful Google models, and AI Studio was perfect for experimenting with them:

Gemini 2.5 Flash: Chosen for its incredible speed and text capabilities. It powers the initial idea processing, the real-time storyboard chat, and the generation of all story text, ensuring the user experience is fluid and interactive.
Gemini 2.5 Flash Image: Used for all image generation tasks. Its quality and ability to adhere to detailed prompts were essential for creating the consistent Character Model Sheet and the 16 unique comic panels.
Veo 2.0: This state-of-the-art model is the magic behind Step 5. I used it to generate the 5-10 second video clips, bringing the user's static comic panels to life with dynamic animation.

Streamlined Deployment

The project is configured to be run directly from Google AI Studio, which seamlessly handles API key management via environment variables, making it incredibly easy for others to clone the repository and run it themselves.

Multimodal Features

The Comic Book Movie Creator is multimodal at its core, weaving together different AI capabilities to enhance the user experience at every step.

Flexible Creative Input (Speech/Image/Text → Text)

The journey begins with true multimodal flexibility. A child can upload a drawing, a writer can type a paragraph, and a storyteller can simply speak their idea. This accommodates different creative styles and makes the app accessible to a wider audience.

Consistent Character Generation (Text → Image)

The "Character Lab" is a key feature that solves a common AI problem. By first generating a definitive "Character Model Sheet" from the user's initial idea and getting approval, we ensure the main character looks consistent across all 16 comic panels, creating a believable and professional-looking story.

From Panel to Motion (Image + Text → Video)

This is the app's "wow" factor. The system takes a static, AI-generated comic panel (image) and its associated story text and uses Veo to create a short, animated video clip. This transforms the final product from a simple slideshow into a genuine "motion comic."

Automated Narration (Text → Speech)

In the final step, the generated story script for each panel is converted into audio using the browser's Web Speech API. This adds a final layer of modality, creating an immersive audio-visual experience where the user can watch and listen to the story they created.

Top comments (4)

Ranjan Dailata • Sep 7 • Edited

Great concept. I love your creative solution.

Sorry with a suggestion

Getting error on the video generation - Failed to generate video.

It would be great to package all the images are create an ebook and let the system generate a PDF or other formats necessary for the creators to do the market publish.