✨ Mind Weaver
This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built Mind Weaver, an app designed to catch those incomplete thoughts, fleeting dreams, and half-formed stories tingling at the edge of your mind, and weave them into a rich, multimodal reality. Have you ever had an idea that you couldn't quite put into words, like "a library where the books whisper secrets" or "a city powered by forgotten memories"? Mind Weaver takes that spark and instantly fleshes it out into both a captivating short story and a beautifully designed poem.
But it doesn't just stop at text. Mind Weaver transforms the generated poem into a unique, shareable piece of visual art-a downloadable "poem card" with elegant fonts and stunning gradient backgrounds. To complete the experience, it can also narrate the story aloud, giving voice to the ideas you've been wanting to explore. It's a tool for anyone who wants to see their nascent ideas blossom into something tangible and beautiful.
🚀 Demo
Try Yourself :- ✨ Mind Weaver
🎥Video Demo
📸Screen Shots
⚒️How it Works:
- A user types a simple idea, a fleeting thought, or a dream fragment into the text box.
- They can select their preferred language for the output.
- With a click of the "Weave My Thought" button, the Mind Weaver begins its work!
- In moments, a story appears, ready to be read.
- Simultaneously, a stylized poem is generated and displayed on a beautiful card.
- The user can then customize the poem's font, download the card as an image, or share it directly.
- For the story, they can press the "Narrate" button and choose a voice to hear their creation read aloud.
How I Used Google AI Studio
Google AI Studio was the engine behind this entire project. I leveraged the Gemini 2.5 Flash model for its incredible speed and creative prowess, which is perfect for generating high-quality text on the fly.
The core of the integration lies in two parallel API calls to the Gemini API using the @google/genai
SDK:
- Story Generation: I send a prompt like:
Write a short, imaginative story in [selected language] based on this input: [user's idea]
. - Poem Generation: A similar prompt is sent for the poem:
Write a beautiful, creative poem in [selected language] based on this input: [user's idea]
.
To create a fluid and responsive user experience, I used the generateContentStream
method. This allows the story and poem to appear on the screen token-by-token as they are being generated, rather than forcing the user to wait for the entire response. It makes the app feel alive and incredibly fast.
Multimodal Features
The true magic of Mind Weaver lies in how it combines different modes of content to create something truly special.
1. Text-to-Image (via HTML & Canvas)
This is the app's most unique multimodal feature. Instead of using a dedicated image generation model, it creates a visual representation of the generated poem programmatically. Here’s the flow:
- Gemini generates the poem text.
- The frontend code dynamically creates a
<div>
element, styling it with a randomly selected gradient background and a user-chosen font. - The popular
html2canvas
library then captures this styled<div>
and converts it into a high-quality PNG image.
This approach creates a unique piece of "word art" from the text, turning the AI's poetic output into a tangible, beautiful image that can be saved and shared.
2. Text-to-Speech (Narration)
To add an auditory dimension, I integrated the browser's native Web Speech API.
- Once the story is generated, the "Narrate" button becomes active.
- The app fetches a list of available voices from the user's browser and populates a dropdown menu, allowing for personalization.
- When the user clicks "Narrate," the API reads the generated story aloud, transforming the written word into a spoken-word performance.
3. Language and Voice Personalization
To make the experience truly global and personal, Mind Weaver incorporates:
- Multilingual Generation: Users can choose to generate both the story and poem in various languages, including English, Spanish, French, German, Japanese, and Hindi. This is achieved by dynamically adjusting the prompt sent to the Gemini API.
- Diverse Narration Voices: The text-to-speech feature connects to the browser's native Web Speech API, populating a dropdown with all available system voices. This allows users to hear their stories read in a wide variety of accents and languages, adding another layer of personalization to the auditory experience.
This combination of generated text, generated imagery, and synthesized speech creates a rich, multi-sensory experience from a single user prompt, showcasing the versatile and creative potential of combining Gemini with other web technologies.
Top comments (0)