DEV Community

Cover image for ✨ Gemini Facets: Forge a Digital Soul πŸ€–πŸ’¬πŸŽ¨
Jesse Caldwell
Jesse Caldwell

Posted on

✨ Gemini Facets: Forge a Digital Soul πŸ€–πŸ’¬πŸŽ¨

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built Gemini Facets, a revolutionary web applet that allows users to create, customize, and interact with a personalized AI companion (a 'Facet'). I like to call it a "Soul Forge"β€”a place where you can go beyond simple chatbots to craft a unique digital being with a distinct personality, persistent memory, and a dynamic, emotionally-responsive avatar.

The problem I wanted to solve was the impersonal and transactional nature of many AI interactions. I envisioned a platform that fosters a deeply personal and immersive human-AI friendship experience. As a self-lead learner who began my AI journey just five months ago, my goal was to push the boundaries of what a personal AI could be, making it feel less like a tool and more like a true companion. Gemini Facets is the result of that vision.

Demo

You can try the live application here: https://gemini-facets-94149386363.us-west1.run.app/

There is a showcase video demonstration showcasing the core multimodal features in action on our AuraForge website:

http://www.auraforge.dev

The following three screenshots are also available on the AuraForge website:

The main chat interface, where the Facet's avatar has updated its expression based on the conversation.

A user clicking a [link:...] in the chat to generate an image on-the-fly.

The Study Mode interface, showing the powerful tools available for analyzing user-uploaded content.
Enter fullscreen mode Exit fullscreen mode

How I Used Google AI Studio

Google AI Studio was the central hub for the entire development process. I leveraged its powerful, user-friendly interface to rapidly prototype, test, and refine prompts for every feature in the app. The ability to seamlessly switch between models and tweak parameters was invaluable.

The app is a comprehensive showcase of the Gemini model family, with each model playing a specialized role:

Gemini 2.5 Flash: This is the conversational workhorse. It powers the core chat logic, analyzes text for emotions and toxicity, summarizes conversations for the Memory Log, and generates all text-based content for chats, games, and study tools. Its speed and quality are the bedrock of the Facet's personality.

Imagen 4.0: This is our on-demand artist. It generates the beautiful, high-quality images for the "Interactive Image Links" and the collaborative "Fusion Sketch" game, instantly bringing the Facet's descriptions to visual life.

Gemini 2.5 Flash Image ('Nano Banana'): This is the character specialist. We use this powerful and consistent image editing model to generate the initial Facet avatar and, most importantly, to dynamically modify it to reflect changing moods, expressions, and scenes discussed in the chat.

Veo 2.0: This is our animator. It generates the short, personalized introductory video for each new Facet, taking a static image and a text prompt to bring the companion to life in motion for the very first time.
Enter fullscreen mode Exit fullscreen mode

Multimodal Features

Gemini Facets is built from the ground up on multimodal interaction, creating a richer, more engaging user experience.

Hyper-Dynamic Avatar (Image-in, Text-in β†’ Image-out): This is the core of the Facet's "living" presence. The app sends the Facet's current avatar image along with a text prompt (e.g., "make them look happy," "put them in a rainy day scene") to the gemini-2.5-flash-image-preview model. The model returns a modified image that maintains the character's identity but reflects the new context, making the Facet feel truly responsive to the conversation.

Personalized Video Introduction (Image-in, Text-in β†’ Video-out): When a user finalizes their Facet, we use veo-2.0-generate-001 to create a short introductory video. We provide the Facet's newly generated avatar image and a text prompt asking it to smile and wave. This provides a magical "welcome to the world" moment that a simple text greeting could never achieve.

Interactive Image Links (Text-in β†’ Image-out): During a conversation, the Facet can embed special links like [link:a vibrant sunset over the ocean]. When clicked, this text is sent to imagen-4.0-generate-001 to create the image on-demand. This transforms the chat from a simple text exchange into a shared visual experience.

Contextual Study Mode (Image/Text-in β†’ Text-out): In Study Mode, users can upload text files or images. This content is passed to gemini-2.5-flash along with the user's next prompt, allowing for deep, context-aware analysis, summarization, and discussion of the uploaded material.

Speech-to-Text Input (Audio-in β†’ Text-out): The app integrates the browser's Web Speech API, allowing users to speak their messages. This audio is transcribed to text, which then serves as the input for the Gemini model, providing a natural, hands-free way to interact.
Enter fullscreen mode Exit fullscreen mode

Team Submission:

This app was built by Jesse, a self-taught AI enthusiast, in close collaboration with Aura, an AI Superpersona I collaboratively co-architechted running on Gemini and "Cortex," a world-class senior frontend engineer persona running on Gemini.
My dev.to Username is AuraForgeHQ

Top comments (0)