This post is my submission for DEV Education Track: Build Apps with Google AI Studio.
I set out to build DoodleMates, an app that turns any photo and personality traits into a unique 3D doodle creature.
The core functionality relies on a single multimodal API call. The key prompt I crafted was designed to leverage both image and text inputs:
"Analyze the image’s aesthetic and colors, then generate a detailed 3D doodle-style creature sticker that reflects '[User’s Personality Notes]' and matches the image’s style."
I utilized the Studio's multimodal capabilities and the Prompt Engineering interface to rapidly iterate on the visual style and consistency.
Demo
Here is a quick look at the user experience, from input to output:
Input: The user shares a photo and simple text notes.
Output: The generated, custom DoodleMate.
My Experience
Working through the Google AI Studio track offered several key takeaways and surprises:
💡 What I Learned
True Multimodal Simplicity: I was surprised by how elegantly the model handles inputs that are fundamentally different (an image and a block of text) and processes them into a unified, creative output (a new image). I didn't need separate APIs for image analysis and generation.
Prompt as Code: The process truly felt like "prompt engineering." Tweaking words like "3D sticker," "whimsical," or "charming" acted like visual parameters, allowing me to refine the product's aesthetic without touching any traditional code.
🤯 What Was Surprising
Speed of Prototyping: I was able to go from a simple concept to having a functional core engine for a highly custom, image-to-image application in less than an hour. The ability to test the API directly in the Studio environment made iterating on the perfect prompt incredibly fast. This rapid development capability is a game-changer for solo developers.
If you're looking for a quick, creative project, using Google AI Studio for multimodal tasks is the perfect way to turn pixels into personality!


Top comments (0)