This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built a game called Guess My Drawing and it does exactly what it sounds like - except the AI is watching.
Here’s how it works:
- First, Gemini generates 4 original images for you. All square, all simple enough to draw (in theory)..
- You pick one and draw it on the canvas, but here’s the twist: you only get one line.
- You have 10 seconds and the timer will judge you.
- Then Gemini tries to guess which image you picked - and explains why.
✅ If it guesses correctly, you score a point.
❌ If it guesses wrong or can't figure out what you drew (because the timer ran out or your sketch resembled existential spaghetti), the AI scores instead.
It's you versus the model for 10 rounds.
And just when you think you've got it down, the images get harder. Every 3 rounds the prompts get more complex and abstract.
(You think drawing a banana is hard? Try drawing "an anxious goblin in baroque style" in one stroke.)
Demo
⚠️ Note: the game is not optimized for mobile, it works best on desktop.
Live game link: https://guess-my-drawing-ai-467116247258.us-west1.run.app
Github link: https://github.com/olgazju/guess_my_drawing_ai
I recorded 3 short demo runs:
- I win (AI guessed correctly)
- AI wins (AI guessed uncorrectly)
- AI gives up (my masterpiece was too avant-garde, apparently)
How I Used Google AI Studio
The entire game was developed directly in Google AI Studio, including both the image generation and sketch analysis logic. I used the built-in code editor to handle everything, from prompt design to response parsing, without needing to switch to any external tools.
At the start of each round the app sends 4 short prompts like
“A smiling dinosaur wearing sunglasses, Pixar style” to the gemini-2.5-flash-image-preview model via generateContent. The model returns four base64‑encoded PNGs, which are shown to the player as selectable options.
After the player draws their selected image in a single line, the sketch is sent back to Gemini with a follow-up prompt that asks it to identify which of the four images the drawing most closely resembles and explain why. The model responds with a guessed index and a short explanation, which are then shown in-game.
Multimodal Features
This project uses two core multimodal capabilities from Gemini:
Text-to-Image
The game uses the gemini-2.5-flash-image-preview model to generate four images at the beginning of each round. Each image is created from a short, structured prompt and returned as a base64-encoded PNG. The images are displayed in a grid for the player to choose from.Image-to-Text
After the player draws their chosen image, the sketch is converted to base64 and sent to the gemini-2.5-flash model with a prompt asking it to identify which of the four options the sketch most closely resembles. The response includes the guessed index and a short explanation, which are displayed as part of the round results.
This loop - from prompt to image to sketch to AI guess - creates a fast, interactive rhythm that makes the game feel alive. You're not just clicking buttons or watching the AI perform; you're actively trying to communicate with it using a single line. Sometimes it gets you immediately. Sometimes it completely misreads your masterpiece. But that unpredictability is part of what makes the experience fun - it feels like a conversation, just in pictures.
Want to play?
Come draw something terrible. I promise Gemini won’t judge. Much.
Top comments (1)
I played so long I didn’t notice my coffee went cold, my cat staged a coup on my chair, and it’s suddenly 3 AM. At least Gemini still recognized my ‘banana,’ which looked more like a panicked worm on roller skates