DEV Community

Cover image for AI Itasha Studio
ANIRUDDHA  ADAK
ANIRUDDHA ADAK Subscriber

Posted on

AI Itasha Studio

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built the AI Itasha Studio! 🚗✨

It's a creative web application for every car enthusiast and pop-culture fan out there.

Have you ever imagined your favorite anime character, video game hero, or custom artwork plastered on the side of a sleek sports car? That's an "Itasha," a Japanese term for cars decorated with fictional characters.

Traditionally, creating these designs is a complex and expensive process, requiring graphic design skills and specialized software.

My applet solves this problem.

It provides a super simple, incredibly fun experience:

  1. You upload any image you love.
  2. The AI gets to work.
  3. Instantly, you get a photorealistic image of your design wrapped beautifully onto a car.

It bridges the gap between imagination and reality, making car customization design accessible to everyone.

Demo

You can try the applet live right here!

➡️ Link to Deployed Applet

Here’s a walkthrough of how it works:

Step 1: The Clean & Simple UI
Our starting point is a bold, clean interface. The focus is entirely on the user's creative journey.

Image descrin ption

Step 2: Upload Your Theme
A user uploads an image. Here, we're using a cool piece of futuristic character art.
Image deription

Step 3: The AI Works Its Magic!
With a single click, the AI generates the final product. Notice how it intelligently wraps the artwork around the car's body, maintaining proper lighting, shadows, and perspective.

Image des cription

Imagescription

How I Used Google AI Studio

Google AI Studio was the heart of this project.

The entire backend logic is powered by the Gemini API, specifically the gemini-2.5-flash-image-preview model. This model is an absolute powerhouse for this kind of task.

My workflow looked like this:

  1. Prototyping in the Studio: Before writing a single line of code, I spent time in Google AI Studio. I experimented with different combinations of images and text prompts. This allowed me to rapidly test my core concept. I could see what worked and what didn't, especially for prompt engineering.

  2. Refining the Prompt: I learned that a simple prompt wasn't enough. Through trial and error in the Studio, I crafted a more detailed prompt that instructs the AI to act as an expert vehicle artist. I specified that it needed to create a full-body vinyl wrap, follow the car's contours, and maintain photorealism. This was key to getting high-quality results.

  3. Implementation: Once I had a winning formula, I implemented it in the application code. The frontend gathers the user's image, combines it with my pre-selected base car image and the refined prompt, and sends it all to the Gemini API.

Multimodal Features

This applet is a celebration of true multimodal AI. It's not just about text-to-image; it's about a conversation between different types of data.

The core multimodal functionality works by combining three distinct inputs:

  • 🖼️ Input Image #1: A static, pre-defined base image of a white sports car. This acts as our canvas.
  • 🎨 Input Image #2: A dynamic, user-uploaded image. This provides the theme, color palette, and artistic style.
  • ✍️ Input Text Prompt: A carefully crafted set of instructions that tells the model how to merge the two images. It's the director, guiding the AI's creative process.

The output is a single, stunning image 🖼️ that represents the fusion of all three inputs.

Why does this enhance the user experience?

The answer is expressiveness and precision.

  • Without the theme image, a user would have to describe their desired design in words alone. This is incredibly difficult and often leads to generic results. How do you describe a specific art style or a character's intricate design in just a few sentences?
  • With the theme image, the user provides a rich, dense source of visual information. The AI can analyze the colors, shapes, and overall vibe far better than any text description could convey.

This multimodal approach lets users say, "Don't just listen to my words; **look at this* and create something amazing from it."* It's a more intuitive, powerful, and deeply personal way to create.

Top comments (0)