DEV Community

Cover image for This AI Guesses Your Drawings Faster Than Your Friends Can.
Seb
Seb Subscriber

Posted on

This AI Guesses Your Drawings Faster Than Your Friends Can.

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built an interactive web application that brings the classic drawing and guessing game into the digital age with a modern twist. The app challenges a player to draw a word provided by the game, while a sophisticated generative AI attempts to guess the drawing in near real-time.
This creates a unique and engaging solo-player experience where the user's artistic skills are pitted against the AI's image recognition capabilities. It solves the problem of needing multiple players for a game of Pictionary and provides a fun, interactive way to experience the power of multimodal AI.

Demo

How I Used Google AI Studio

I leveraged the Gemini API, accessible through the @google/genai SDK, to power the core guessing mechanic of the game. Specifically, I used the gemini-2.5-flash model for its speed and powerful multimodal capabilities.

The implementation involves capturing the user's drawing from the HTML canvas as a PNG image, converting it to a base64 string, and sending it to the Gemini model. This image is sent alongside a carefully crafted text prompt: "What is this a drawing of? Look at the image carefully and provide your best guess in a single word." The model then processes this combined visual and textual input to return its guess as a single word of text. This demonstrates a powerful image-to-text, or visual understanding, use case.

Multimodal Features

The central multimodal feature of this application is visual reasoning and description. The app seamlessly integrates two distinct modalities:

Image Input: The user's free-form drawing on the canvas serves as the primary visual input.

Text Output: The Gemini model analyzes this visual information and generates a textual guess.

Top comments (0)