This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built StoryWeaver AI, a multimodal storytelling web application powered by Google Gemini 2.5 Flash.
The app allows anyone to input text, image, or audio (individually or combined) and instantly transforms it into an engaging 300β400 word creative story with a short narration script.
The goal is simple: to make storytelling more accessible, fun, and creative by blending traditional story crafting with cutting-edge AI capabilities.
Built with Flask + TailwindCSS and deployed on AWS EC2 with a custom domain and HTTPS, StoryWeaver AI provides a smooth, secure, and visually appealing experience.
Demo
π₯ YouTube Walkthrough:
π Live App β https://story.praveshsudha.com
π§βπ» Full Source Code (Navigate inside google-studio-challenge dir):
ποΈ Dev.to Challenges β by Pravesh Sudha
This repository contains my submissions for various Dev.to Challenges. Each folder in this repo includes a hands-on project built around specific tools, APIs, or themes β from infrastructure to frontend and AI voice agents.
π Projects
βοΈ pulumi-challenge/
An infrastructure-as-code project built using Pulumi.
It automates cloud infrastructure setup using Python and TypeScript across AWS services.
π¨ frontend-challenge/
A UI/UX-focused project that demonstrates creative frontend solutions using HTML, CSS, and JavaScript β optimized for responsiveness and accessibility.
π© postmark-challenge/
A transactional email solution built with the Postmark API, showcasing email templates, delivery tracking, and webhook handling.
π§ philo-agent/
A voice-based AI Philosopher built with AssemblyAI + Gemini β part of the Worldβs Largest Hackathon.
ποΈ Project Structure
dev-to-challenges/
β
βββ pulumi-challenge/
βββ frontend-challenge/
βββ postmark-challenge/
βββ philo-agent/
βββ README.md
π Why This Repo?
This repo is my playground to:
- β¦
πΈ Screenshots
How I Used Google AI Studio
I used Google AI Studio with the Gemini 2.5 Flash model to handle multimodal input. By integrating the API into my Flask backend, I was able to process different forms of content:
- Text prompts are directly turned into narrative-rich stories.
- Image inputs are interpreted, and the AI builds a story inspired by visual details.
- Audio inputs are analyzed, and the context is woven into a creative narrative.
This combination makes the app versatile and fun β users are free to interact with it however they like.
Multimodal Features
The standout feature is that users arenβt restricted to just one form of input. They can:
- Provide just text for a direct storytelling experience.
- Provide an image to get a narrative based on visuals.
- Provide audio for stories generated from sound-based input.
- Or combine all three for richer, more context-aware responses.
This flexibility showcases the true strength of Geminiβs multimodal capabilities, turning it into more than just a text generator β it becomes a storytelling partner.
Why it matters
For centuries, stories have been humanityβs default way of sharing ideas, culture, and imagination. From cave paintings to epics, from bedtime tales to novels, stories shape how we learn, dream, and connect.
But creating stories isnβt always easy for everyone. Thatβs where AI helps. With StoryWeaver AI, anyone β whether a child imagining a dragon, a student preparing for class, or a casual dreamer β can bring their ideas to life instantly.
By blending human creativity with AI multimodal understanding, weβre expanding the ways people can express themselves.
Conclusion
StoryWeaver AI is my way of showing how AI and storytelling can beautifully merge. With the power of Google Gemini 2.5 Flash, this project highlights how multimodal inputs can enrich experiences beyond plain text.
β¨ Try it out here: https://story.praveshsudha.com
I hope this inspires you to imagine whatβs possible when we combine AI and creativity. After all β βIf you can think it, you can build it!β
π Connect with me:
- π GitHub: Pravesh-Sudha
- πΌ LinkedIn: Pravesh Sudha
- π¦ Twitter/X: @praveshstwt
- πΊ YouTube: @pravesh-sudha
Top comments (1)
Building a web interface with any LLM doesn't make you a developer. That makes you a lazy human being π.