__This is a submission for the Google AI Studio Multimodal Challenge
💡 Inspiration
I always wanted to make comics that can capture my chaotic imaginations - but the drawing, erasing, starting again is 👀 such a drag!Also, AI didn't help much - create - frustrate - regenerate - repeat and yet couldn't get my vibe 🌈 ... 👀 even more drag!
Well that was until ✨ Gemini nano banana gemini-2.5-flash-image-preview!
I am so blown by its editing capabilities specially working with multi-image, multi-modal inputs that I couldn't allow my lazy self to procrastinate anymore !
So, here's (quick links)
🧩 Multimodal app architecture
✨ Multimodal capabilities I implemented
What I built
SONICS.ai 🪄 is a Google-AI ✨ powered creative suite 🧠 🎬 📚 🎞️ that transforms user's simple idea into a fully-realized, multi-sensory, character-consistent comic book experience with podcast playbacks.It allows users to add their flavours/ vibes 🌈 to every aspect of comic creation - from storyline to characters to scenes to dialogues to text styles - all in natural language.
The best part? You dont need to be good at drawing! AI solves it for you in ⚡ minutes !You can bring your creativity to life without losing your patience with back-n-forth regeneration to get that perfect shot!
You can use SONICS for a variety of use cases - from bedtime storiespodcastto full production-readycomicswithplaybackBring your stories to life - your style!
Let your imagination go wild !
Demo
My project in action
0:00 Intro
0:10 🧠 Story Conception
0:20 🎬 Character/ Cast Design
0:53 🎞️ Comic Panel Creation
1:24 📚 Comic preview
1:34 🎧 Audio preview
1:47 🎥 Play the Comic that speaks your Style
Note : Due to billing constraints, I couldnt deploy my app so this is the video demo 👆 showing my project in action.
How I Used Google AI Studio
This app was entirely built on Google AI studio ⚡vibe-coded from scratch
👀 as you could have guessed by now for my lazy vibes !
I started with a simple idea prompt and kept on adding features by guiding the AI through pain-points I have faced when vibe-creating comics with my flavour.
The Multimodal capabilities I implemented ...
Multimodal Capabilities
Input
Output
Models ✨
Features 🚀
Text
Image
gemini-2.5-flash-image-preview
imagen
For quality Character, Scene Background generation
Text editor based updates
Image + Text
Text
gemini-2.5-flash
Automatic character description updates for natural language based character edits
Image (mask) + Image + Text
Image
gemini-2.5-flash-image-preview
For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement
Multiple Images + Text
A composite image with rendered text
gemini-2.5-flash-image-preview
For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality
Multimodal Features
The specific Multimodal functionalities 🚀 I built and why it enhances the user experience 👤 (UX)...
Composite scene panels 🎞️
✨imagengemini-2.5-flash-image-previewgemini-2.5-flash
🚀 The comic panels are created through an intelligent composition logic combining the multimodal capabilities of the models to create final panel images from the inputs - scene background, character images, scripts that were themsleves generated by using either of these.
👤 This ensures character consistency, dialogue accuracy as well as scene quality across comic scenes.
Flavour edits 🌈
✨gemini-2.5-flash-image-previewgemini-2.5-flash
🚀 It is used for enabling precise surgical edits of scenes, characters, dialogues, styles leveraging masking.
Users can simply explain their edits in natural language for feature changes (with / without masking).
It also handles auto-updating user edit requests for images which must reflect in their respective strategic texts like character description to ensure further consistencies.
👤 This helps users avoid regenerating back-and-forth images from scratch which was really frustrating when we need to make a small style/ error correction. And users can add their vibes/ flavours/ styles to the scene in natural language without worrying about any inconsistency.
🎉
Acknowledgement
Google AI studio ⚡ is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs.
But as you could have guessed 👀 Parkinson's law took most time !
gemini-2.5-flash-image-preview✨ (Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience, and solve the back-and-forth regeneration & vibe-check problem for vibe-comic enthusiasts.
imagen✨ helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic.
gemini-2.5-flash✨ has been used for prompt engineering for inputs to other models, for auto-updating descriptions and also for optimising the deliverables.
Thank you!
It was a fun and great experience!
👀
What Definitely Not a drag!

Top comments (5)
This is dope!
This is a great indeed love the way you used your creativity to achieve such an amazing project
To all the comic enthusiasts- add your suggestions you would like to try
Comment down your use cases & any suggestions
Some comments may only be visible to logged-in visitors. Sign in to view all comments.