This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built PixelForge 3D, a creative partner for game developers and 3D artists.
Imagine you're designing a new game. You need a legendary sword.
Instead of spending hours sketching or modeling basic concepts, you just type...
"A mythical sword glowing with arcane energy."
In moments, PixelForge 3D doesn't just give you one image.
It gives you ten unique, high-quality concepts.
Each one is from a different angle, with a different artistic description, ready for your game.
A front view, a top-down view, a close-up on the glowing runes... you name it.
But it doesn't stop there. See a design you almost love?
Just click "Edit" and type, "Make the glow electric blue and add cracks to the blade."
PixelForge 3D seamlessly edits the asset for you.
It's designed to solve a real problem: breaking through creative blocks and accelerating the asset conceptualization process from hours to minutes.
Demo
Here is a link to the live applet:
Link to Deployed Applet Would Go Here
And here’s a glimpse into the creative workflow.
First, you describe your vision.
Simple text is all you need. We even provide suggestions to get you started!
Next, the AI forges ten unique concepts for you.
You get a whole grid of ideas, complete with varied angles and detailed descriptions.
Finally, you refine and perfect your asset.
A simple modal lets you use text to make powerful edits to any image you choose.
How I Used Google AI Studio
Google AI Studio was my command center for bringing this app to life. The core idea was to create a pipeline of multimodal capabilities.
Orchestrating Concepts with
gemini-2.5-flash
: I used AI Studio to perfect a prompt that asks Gemini Flash to act as a creative director. I instructed it to take a user's prompt and generate a structured JSON object containing ten uniqueangle
anddescription
pairs. This was the blueprint for our asset generation.Forging Assets with
imagen-4.0-generate-001
: With the JSON blueprint, I then programmatically create ten new, more detailed prompts for Imagen 4. Each prompt combines the user's original idea with the unique angle and description from Gemini Flash. This is how we get such rich variety in the output.Refining with
gemini-2.5-flash-image-preview
(Nano Banana): For the editing feature, I leveraged the powerful image-and-text understanding of Nano Banana. I prototyped in AI Studio how the model would interpret an input image alongside a text instruction to generate a new, modified image. This confirmed the intuitive "select and describe" editing flow was possible.
Multimodal Features
PixelForge 3D is built on two core multimodal experiences that work in harmony.
1. The Text-to-Concept-Array-to-Image-Gallery Flow
This is the heart of the initial generation.
It's more than just text-to-image. It's a multi-step creative process.
- Input: User provides a single text prompt.
-
Processing:
-
gemini-2.5-flash
interprets the text and outputs structured data (JSON)—a list of 10 creative concepts. - The application then uses this data to generate 10 distinct images with
imagen-4.0-generate-001
.
-
- Output: A full gallery of 10 images.
Why it's better: This provides immense creative leverage. It transforms one simple idea into a board of possibilities, helping users discover designs they might not have thought of on their own. It automates brainstorming.
2. The Image-and-Text-to-Image Editing Loop
This is what makes the app truly interactive and powerful.
- Input: User provides an image (by clicking "Edit") and text (by typing their changes).
-
Processing:
gemini-2.5-flash-image-preview
takes both the existing visual data and the new text instructions into account. - Output: A new image that reflects the requested changes.
Why it's better: This creates an intuitive, iterative design cycle. Instead of starting over with a new prompt, users can collaborate with the AI, refining the generated assets with natural language. It makes the creative process feel less like a command and more like a conversation.
Top comments (1)
Will love to see if we get these models in glb format