This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built AppWeaver AI.
It’s a web application designed to bridge the gap between imagination and tangible design.
Have you ever had a brilliant idea for a mobile app but got stuck trying to visualize it? AppWeaver AI solves that exact problem.
It empowers anyone—from seasoned developers to aspiring entrepreneurs—to generate stunning, high-fidelity mobile app mockups simply by describing their vision in plain text.
No Figma, no Sketch, no complex design tools. Just your words.
The app doesn't just create a single screen; it generates an entire user flow, from onboarding to the profile page, giving you a holistic view of your concept. It’s not just a tool; it's your personal AI design partner, ready to iterate and refine with you.
Demo:
Here’s a quick walkthrough of how AppWeaver AI brings an idea to life.
- The Spark of an Idea: A user starts by typing a prompt, like "a minimalist language learning app with a clean, Duo-lingo inspired aesthetic." They can also choose how many initial screens they want, from 3 to 10.
- AI-Powered Weaving: The app then generates a series of high-resolution (9:16) app screens, complete with brief descriptions for each one. The designs appear in a sleek, scrollable gallery.
- Iterate and Refine: This is where the magic happens. The user can click "Edit" on any design. A modal pops up, allowing them to type in changes. For example: "Change the primary button color to electric blue and add an illustration of a book."
- The Final Polish: The AI processes the image and the text prompt, returning a newly edited design. The user can download their creations at any point, ready for presentations, pitch decks, or developer handoffs.
How I Used Google AI Studio
Google AI Studio was the engine behind this entire project. I leveraged the @google/genai
SDK to orchestrate a sophisticated, multi-step AI workflow.
gemini-2.5-flash
for Structured Data Generation: The first step isn't generating an image directly. I usegemini-2.5-flash
to interpret the user's simple prompt and expand it into a structured JSON array. Each object in the array contains a thoughtfuldescription
for a specific app screen and a highly detailedimagePrompt
tailored for an image model. This ensures a logical user flow and creative, diverse visuals.imagen-4.0-generate-001
for Initial Design Creation: The detailed prompts generated by Flash are then fed intoimagen-4.0-generate-001
. This model's power in creating high-quality, coherent images is perfect for producing the initial set of app designs with a consistent 9:16 aspect ratio.gemini-2.5-flash-image-preview
(Nano Banana) for Editing: The interactive editing feature is powered by the groundbreaking Nano Banana model. It takes the existing design (image) and the user's edit request (text) as inputs to generate a new, modified design. This is the core of the app's multimodal power.
Multimodal Features
AppWeaver AI is built on a foundation of two key multimodal interactions that create a seamless and powerful user experience.
Text-to-Image Generation (The Concept Phase): This is the initial creative spark. The user provides a text prompt, and the application returns a series of images. This classic multimodal capability allows for the rapid visualization of an abstract idea, turning words into concrete designs instantly.
-
Image + Text -> Image + Text (The Iteration Phase): This is where AppWeaver AI truly shines and becomes a collaborative tool. A user selects an image they want to change and provides a new text prompt describing the modification.
- The model,
gemini-2.5-flash-image-preview
, understands the context of the existing image and the instructions in the new text. - It then outputs a new, edited image that reflects the requested changes.
- It often provides a new text description as well, confirming the changes it made.
- The model,
This iterative loop is incredibly powerful. It transforms the user from a passive prompter into an active director of the design process, allowing for nuanced control and refinement that wouldn't be possible with simple text-to-image generation alone.
It’s a true conversation between the user and the AI, using both language and visuals.
Top comments (0)