DEV Community

Cover image for Technical Deep Dive: Building Artie
Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Technical Deep Dive: Building Artie

In this technical blog, we'll explore how to build a multi-agent, AI-powered drawing and guessing game using Google's Gemini 2.5 and 3.1 models. We'll cover the architecture, the personality-driven image generation, and the multimodal guessing logic.

The Vision

The goal was to create a "Showdown" where distinct AI agents with unique artistic personalities could play a drawing and guessing game. Each agent would have its own:

  • Drawing Style: From abstract and messy to pixel-perfect and geometric.
  • Guessing Personality: Influencing how they interpret visual data.

The Architecture

The application is a React-based SPA that orchestrates a complex flow between two different AI models:

  1. Drawer Phase (Gemini 2.5 Flash Image):
    • Takes a word (e.g., "Elephant") and an agent's style (e.g., "Abstract").
    • Generates an Artie-style image that reflects that personality.
  2. Guesser Phase (Gemini 3 Flash):
    • Takes the generated image as input.
    • Analyzes the visual data and attempts to guess the word.
    • Uses the agent's personality to add flavor to the guess (e.g., "It looks like a messy elephant!").

Flow Diagram

Architecture Diagram

Personality-Driven Image Generation

The secret sauce is in the prompt engineering. We don't just ask for a "drawing of a cat." We ask for a "drawing of a cat in a messy, abstract style with vibrant colors."

const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash-image',
  contents: {
    parts: [{ 
      text: `A drawing of a ${word}. Style: ${drawer.drawingStyle}. 
             Artie style, white background, bold lines. 
             Make it look like it was drawn by a human with this personality.` 
    }]
  },
  config: {
    imageConfig: { aspectRatio: "1:1" }
  }
});
Enter fullscreen mode Exit fullscreen mode

By specifying aspectRatio: "1:1", we ensure the images fit perfectly into our canvas.

Multimodal Guessing Logic

For the guessing phase, we use Gemini 3 Flash, which has incredible multimodal capabilities. We send the base64-encoded image along with a prompt that defines the guesser's personality.

const response = await ai.models.generateContent({
  model: 'gemini-3-flash-preview',
  contents: {
    parts: [
      { inlineData: { mimeType: 'image/png', data: base64Data } },
      { text: `You are playing a drawing and guessing game. What is this a drawing of? 
               The answer is a single word. Your personality is ${guesser.personality}. 
               If the drawing is bad, you might guess something wrong but related to your personality.` }
    ]
  }
});
Enter fullscreen mode Exit fullscreen mode

We then sanitize the output to extract a single word for comparison with the target word.

State Management & UI

We used React State and Refs to manage the game flow, timers, and logs. For the UI, we went with a "Technical Dashboard" aesthetic using Tailwind CSS:

  • Bento Grids: To organize the scoreboard, canvas, and guesses.
  • Framer Motion: For smooth transitions and a "live" feel.
  • Lucide Icons: For clear visual representation of each agent.
  • Tabbed Navigation: A new multi-tab system (Game, History, Agents) to separate gameplay from data exploration.

The Agent Personality Engine

One of the most recent updates was the expansion of our "Agent Personality Engine." We didn't want the agents to just be names and colors; we wanted them to have character.

We added:

  • Traits: Categorical descriptors that influence how a user perceives the agent's "thinking."
  • Signature Moves: Named artistic techniques (like "The Scribble Storm") that give context to their unique drawing styles.
  • Fun Facts: Narrative flavor that builds a world around these digital artists.

This was implemented by expanding the Agent interface and creating a dedicated Agents Section with interactive profile modals. This allows users to understand why an agent drew something a certain way, bridging the gap between raw AI output and human-like creativity.

Challenges & Solutions

  • Image Generation Latency: We added a "Drawing..." state with a spinner to keep the user engaged.
  • Guessing Accuracy: By providing the agent's personality in the prompt, we made the guesses more "human-like" and sometimes hilariously wrong, which adds to the fun.
  • Responsive Canvas: We used aspect-square to ensure the drawing area remains consistent across different screen sizes.

Future Roadmap

  • Human vs. AI: Let users draw and have the AI guess.
  • Real-time Voice: Use Gemini's TTS to have agents "speak" their guesses.
  • Persistent Leaderboards: Using Firebase to track scores across sessions.

Artie demonstrates that AI isn't just for productivity—it can be creative, competitive, and entertaining. By combining different models and personality-driven prompts, we can create truly unique digital experiences.

Example Output

Github Repo: https://github.com/harishkotra/Artie

Top comments (0)