This is a submission for the Google AI Studio Multimodal Challenge
What I Built
I built SpiritDex, an explorer's journal that transforms the entire world into a canvas for discovery. It's a web application where users can explore any real-world location on a map, scan for spiritual energy, and discover unique AI-generated "spirits."
Each spirit is procedurally generated by the Gemini API, with its name, lore, appearance, and stats inspired by the genuine history, folklore, and mythology of the chosen location. This creates an endlessly replayable and deeply personal collection experience. You aren't just collecting monsters; you're uncovering the hidden stories of places, given form and life by AI.
The app solves the problem of generic content in collection games by grounding every single creation in authentic, verifiable real-world lore, making each discovery feel meaningful and unique to the player's journey.
Demo
Try the applet live here: https://spiritdex-25956270103.us-west1.run.app/
Here's a walkthrough of the core user journey:
Explore & Discover: The user finds a location on the map, like the Tower of London, and scans for spirits.
Uncover an Echo: A mysterious clue and a hazy image appear, hinting at a spirit of a specific rarity.
Reveal & Collect: The user spends energy to reveal the spirit. Gemini generates its full lore, stats, and a unique portrait.
Create a Personal Encounter: The user can then take a photo and use the "Create Encounter" feature. The AI seamlessly edits the spirit into their photo, creating a personal, shareable memory.
Build Your Journal: Every spirit is added to the user's journal and deck, where they can read its history, add personal notes, and even chat with it.
How I Used Google AI Studio
Google AI Studio was instrumental for prototyping and refining the complex, multi-step prompts that power SpiritDex. I heavily relied on it to test different model configurations and system instructions to achieve the desired tone and data structure.
The app is built entirely on the capabilities of the Gemini API:
-
gemini-2.5-flash
: This is the workhorse of the app. It handles:- Grounded Research: Using the
googleSearch
tool to find authentic historical and mythological context for locations. - Structured Data Generation: Creating the core
SpiritData
JSON object with a strict schema (responseSchema
). - Creative Writing: Generating cryptic clues, lore, and dynamic journal entries.
- Conversational AI: Powering the "Commune" feature, where it role-plays as the spirit using a detailed system prompt.
- Grounded Research: Using the
imagen-4.0-generate-001
: This model is responsible for giving the spirits visual form. It generates the primary spirit portraits and the artistic "journal illustrations" based on detailed textual descriptions created bygemini-2.5-flash
.gemini-2.5-flash-image-preview
: This powerful image editing model is the magic behind the "Create Encounter" feature, allowing for sophisticated image-and-text-to-image generation.
Multimodal Features
SpiritDex is built from the ground up on multimodality, weaving different models and inputs together to create an immersive experience.
1. Grounded Spirit Generation (Search + Text-to-JSON)
This is the core of the app. Instead of just asking the AI to "make up a spirit," I use a two-step process to ensure quality and authenticity:
- Research: First,
gemini-2.5-flash
is prompted to use the Google Search tool to find a specific, compelling piece of lore or history about a location. - Generation: The findings from that search are then fed as context into a second prompt. This prompt instructs the model to create a spirit directly based on that context, outputting the result as a clean JSON object using
responseSchema
. This combination of web grounding and structured data generation ensures every spirit feels like a real, researched legend, not just a random creation.
2. Visual Manifestation (Text-to-Image)
Once a spirit's data (including a detailed visual description) is generated, that description is passed to imagen-4.0-generate-001
. The prompt is specifically engineered to produce a "found footage" aesthetic (harsh flash, high ISO noise, motion blur), making the spirits feel more mysterious and grounded, as if they were captured by an amateur explorer. This directly translates the AI's textual idea into a compelling visual.
3. Personal Encounters (Image + Text-to-Image)
This is my favorite feature. A user uploads their own photo and selects a spirit. The app then sends the user's image along with a detailed text prompt to gemini-2.5-flash-image-preview
. This prompt instructs the model to:
- Place the spirit into the user's environment.
- Generate a new background and outfit for the user that matches the spirit's lore and location.
- Apply the "found footage" style to the entire image for a cohesive look.
This creates a brand new, composite image that looks like a genuine snapshot of a supernatural encounter, providing a deeply personal and shareable piece of content that connects the user directly to their discovery.
4. Location Scouting (Image-to-Text)
To enhance the exploration fantasy, users can upload a photo of a landmark. The app uses gemini-2.5-flash
's vision capabilities to analyze the image, identify the location, and provide its name (e.g., "Eiffel Tower, Paris, France"). This text output is then used to automatically search for that location on the map, providing a fun, alternative way to begin the discovery process.
Top comments (0)