This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built Dungeon Master, an interactive, text-based fantasy role-playing game that brings the classic Dungeons & Dragons experience to life with the power of generative AI. The applet creates a truly dynamic and endlessly replayable adventure where every session is unique.
At its core, a sophisticated AI acts as the Dungeon Master, generating a rich, evolving narrative in real-time based on the player's typed commands. It describes immersive environments, introduces compelling non-player characters (NPCs), presents challenging scenarios, and manages the player's stats behind the scenes.
What truly sets this experience apart is its multimodal approach: as the AI describes new locations or characters in the story, it simultaneously generates and displays corresponding visual art for maps and character portraits. This transforms a purely text-based adventure into a visually engaging experience, making the world feel tangible and alive.
Demo
How I Used Google AI Studio
I leveraged two powerful models from the Gemini API to create this multimodal experience:
1.Gemini 2.5 Flash (gemini-2.5-flash): This model is the brain of the Dungeon Master.
- System Instruction: I provided a detailed system instruction that defines the AI's persona as a "masterful Dungeon Master." This prompt also includes strict formatting rules, instructing the AI to wrap specific descriptions in tags like [MAP_DESCRIPTION: ...], [NPC_DESCRIPTION: ...], and [HEALTH: ...]. This structured output is crucial for the application logic.
- Streaming Chat: I used the chat.sendMessageStream method to receive the AI's response. This creates a more dynamic user experience, as the Dungeon Master's text appears word-by-word, mimicking the cadence of a real person telling a story.
- Imagen 4 (imagen-4.0-generate-001): This model is the artist that brings the Dungeon Master's words to life.
- Dynamic Prompting: The application code parses the streamed text from Gemini 2.5 Flash in real-time. When it detects a [MAP_DESCRIPTION: ...] or [NPC_DESCRIPTION: ...] tag, it extracts the descriptive text.
- Image Generation: This extracted text is then used to construct a detailed prompt for Imagen 4, which generates a high-quality, fantasy-style image that visually represents the scene or character just described in the narrative.
Multimodal Features
The core multimodal feature of AI Dungeon Master is the synergistic interplay between generative text and generative images. This isn't just a story with static illustrations; it's a world built from the ground up, text and visuals, in direct response to the player's actions.
- Text-to-Image Pipeline: The player's text command initiates a chain reaction. It prompts a text response from the language model, and specific parts of that text response, in turn, become the prompts for the image model.
- Enhanced Immersion: This enhances the user experience profoundly. Reading about a "gnarled dwarf blacksmith with a fiery beard" is one thing, but seeing a unique, AI-generated portrait of him moments later makes the interaction far more memorable and immersive. Exploring a "sun-dappled enchanted forest" feels more real when you have a map of that very forest on your screen.
- Infinite Visual Variety: Because both the story and the images are generated on the fly, no two adventures are the same, either narratively or visually. Every player's journey will feature unique maps and characters tailored specifically to their playthrough. This creates a powerful feedback loop where the player feels a true sense of discovery, knowing that the world is truly being created just for them.
Top comments (0)