This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built GenoCraft, a sleek and futuristic web application designed to spark creativity for artists, designers, and storytellers.
At its core, GenoCraft is an AI-powered concept generator. It tackles the "blank canvas problem" by transforming simple text ideas into rich, multimodal DNA profiles for imaginary organisms.
Instead of just getting a single image, users receive a set of three unique variations, each with:
- A stunning, AI-generated visual of the DNA helix.
- A compelling, imaginative title.
- A descriptive paragraph hinting at a potential business or scientific application.
It's a tool for turning a flicker of an idea into a tangible, visual, and narrative starting point. Imagine a game designer creating new alien species, a writer visualizing a key plot device, or a branding expert developing a biotech company's identity—GenoCraft is their launchpad.
Demo
Check out the GenoCraft app in action!
Here’s a glimpse of the user journey:
Step 1: The Idea Spark
A user selects a base organism and provides a simple, creative prompt.
Step 2: AI-Powered Creation
The app generates three distinct, visually stunning variations of the concept, complete with titles and detailed descriptions.
Step 3: Refine and Edit
The user can then select any profile and use natural language to perform powerful image edits, like changing colors or adding new elements.
How I Used Google AI Studio
GenoCraft is powered entirely by the Gemini API, orchestrated through the @google/genai
library. I leveraged a suite of models to create a seamless, multimodal experience.
gemini-2.5-flash
for Text and Logic: This model is the creative brain of the operation. I use it for all text-based tasks, most crucially by providing it aresponseSchema
. This allows me to ask for complex, structured JSON output in a single API call, receiving perfectly formatted titles, descriptions, and even unique image prompts for each variation. This is incredibly efficient and reliable.imagen-4.0-generate-001
for Image Generation: This model is the artist. It takes the detailed text prompts generated bygemini-2.5-flash
and renders the beautiful, abstract DNA visualizations that form the core of the app's output. Its ability to interpret artistic and scientific concepts is key.gemini-2.5-flash-image-preview
for Image Editing: This model provides the "magic" editing feature. It can take an existing image and a text prompt (e.g., "make it more purple") and return a modified image, making the creative process truly iterative.
Multimodal Features
Multimodality isn't just a feature in GenoCraft; it's the entire foundation. The app thrives on the interplay between text and images.
Conceptual Text-to-Image Generation: The primary user flow is a perfect example of multimodality. The user's text prompt is first interpreted and expanded by
gemini-2.5-flash
to create richer, more detailed prompts, which are then used byimagen-4.0-generate-001
to create the final images. It's a two-step process where one model creatively directs the other.Text-Guided Image Editing: The edit functionality is a powerful demonstration of multimodal input. The user provides both an image (the DNA profile they want to change) and text (their desired edit).
gemini-2.5-flash-image-preview
understands the context of both inputs to produce a new image that seamlessly incorporates the change. This creates a fluid, intuitive editing experience that feels like a conversation with a creative partner.Synchronized Content Generation: The app generates and presents image and text pairs that are contextually linked. The title and description for each DNA profile aren't generic; they are specifically crafted by the AI to match the visual it also helped create. This ensures a cohesive and immersive final output for the user.
Thanks for checking out my project!
Top comments (0)