Ha3k

Posted on Sep 11

AI 3D Asset Generator

#devchallenge #googleaichallenge #ai #gemini

Google AI Challenge Submission

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built PixelForge 3D, a creative partner for game developers and 3D artists.

Imagine you're designing a new game. You need a legendary sword.
Instead of spending hours sketching or modeling basic concepts, you just type...

"A mythical sword glowing with arcane energy."

In moments, PixelForge 3D doesn't just give you one image.
It gives you ten unique, high-quality concepts.

Each one is from a different angle, with a different artistic description, ready for your game.
A front view, a top-down view, a close-up on the glowing runes... you name it.

But it doesn't stop there. See a design you almost love?
Just click "Edit" and type, "Make the glow electric blue and add cracks to the blade."

PixelForge 3D seamlessly edits the asset for you.

It's designed to solve a real problem: breaking through creative blocks and accelerating the asset conceptualization process from hours to minutes.

Demo

Here is a link to the live applet:
Link to Deployed Applet Would Go Here

And here’s a glimpse into the creative workflow.

First, you describe your vision.
Simple text is all you need. We even provide suggestions to get you started!

Next, the AI forges ten unique concepts for you.
You get a whole grid of ideas, complete with varied angles and detailed descriptions.

Finally, you refine and perfect your asset.
A simple modal lets you use text to make powerful edits to any image you choose.

How I Used Google AI Studio

Google AI Studio was my command center for bringing this app to life. The core idea was to create a pipeline of multimodal capabilities.

Orchestrating Concepts with gemini-2.5-flash: I used AI Studio to perfect a prompt that asks Gemini Flash to act as a creative director. I instructed it to take a user's prompt and generate a structured JSON object containing ten unique angle and description pairs. This was the blueprint for our asset generation.
Forging Assets with imagen-4.0-generate-001: With the JSON blueprint, I then programmatically create ten new, more detailed prompts for Imagen 4. Each prompt combines the user's original idea with the unique angle and description from Gemini Flash. This is how we get such rich variety in the output.
Refining with gemini-2.5-flash-image-preview (Nano Banana): For the editing feature, I leveraged the powerful image-and-text understanding of Nano Banana. I prototyped in AI Studio how the model would interpret an input image alongside a text instruction to generate a new, modified image. This confirmed the intuitive "select and describe" editing flow was possible.

Multimodal Features

PixelForge 3D is built on two core multimodal experiences that work in harmony.

1. The Text-to-Concept-Array-to-Image-Gallery Flow

This is the heart of the initial generation.
It's more than just text-to-image. It's a multi-step creative process.

Input: User provides a single text prompt.
Processing:
- gemini-2.5-flash interprets the text and outputs structured data (JSON)—a list of 10 creative concepts.
- The application then uses this data to generate 10 distinct images with imagen-4.0-generate-001.
Output: A full gallery of 10 images.

Why it's better: This provides immense creative leverage. It transforms one simple idea into a board of possibilities, helping users discover designs they might not have thought of on their own. It automates brainstorming.

2. The Image-and-Text-to-Image Editing Loop

This is what makes the app truly interactive and powerful.

Input: User provides an image (by clicking "Edit") and text (by typing their changes).
Processing: gemini-2.5-flash-image-preview takes both the existing visual data and the new text instructions into account.
Output: A new image that reflects the requested changes.

Why it's better: This creates an intuitive, iterative design cycle. Instead of starting over with a new prompt, users can collaborate with the AI, refining the generated assets with natural language. It makes the creative process feel less like a command and more like a conversation.

Top comments (2)

Roshan Sharma • Sep 11

This is awesome! 🎮 PixelForge 3D feels like a game changer for artists, getting 10 concepts from one prompt + easy editing = huge boost in creativity. Love how the gallery angles & text-driven edits make iteration smooth.

Usman Mehfooz • Sep 11

Will love to see if we get these models in glb format