DEV Community

Cover image for Crystal Vision AI
Arunav Maitra
Arunav Maitra

Posted on

Crystal Vision AI

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I built Crystal Vision AI.

My goal wasn't just to create another image generator.
I wanted to build an experience.

A magical portal where your ideas and photos are transformed into mystical works of art.

Crystal Vision AI solves a simple problem: How can we make AI art generation more personal and enchanting?

It does this in two powerful ways:

  1. Enchant an Image: You can upload your own photo—of your pet, a friend, or a favorite object. Then, you provide a text prompt to magically edit it. The AI understands both the image and your words to create something entirely new.

  2. Summon a Vision: For moments of pure imagination, you can simply describe a scene. The AI acts as your personal oracle, conjuring a stunning, photorealistic image from your words alone.

The core magic?

Every creation is beautifully and seamlessly encapsulated within a glowing, hyper-realistic crystal ball, turning every generation into a unique, mystical artifact.

It's a tool designed to spark joy, unleash creativity, and make you feel like a real magician.

Demo

Behold the magic in action!

🔮 Live Applet Link: Experience Crystal Vision AI Here!

Here's a glimpse into the visual journey:

The Grand Welcome

Users are greeted by an ethereal, animated interface that immediately sets a magical tone.

Image descri ption

Enchanting a Personal Photo

Here, a user has uploaded a photo of their cat and is adding a prompt to give it a sparkling crown. Notice the simple, intuitive controls.

Image descri ption

Image des cription

The Final Masterpiece

After a moment of 'consulting the oracle,' the final vision is revealed—a breathtaking image, perfectly rendered inside the crystal ball.

Image descn ription

Image descri ption

How I Used Google AI Studio

Google AI Studio was my digital alchemy lab. It was the crucial first step where I prototyped, tested, and truly understood the capabilities of the Gemini models before writing a single line of production code.

My two key ingredients were:

  • gemini-2.5-flash-image-preview: This was the absolute star of the show. Its powerful multimodal capabilities are the engine behind the "Enchant an Image" feature. I used the Studio to test how the model would interpret an uploaded image alongside a text prompt.

  • imagen-4.0-generate-001: This model is a pure powerhouse for text-to-image generation. It's the oracle that powers the "Summon a Vision" feature, creating stunningly detailed images from just a description.

My process involved countless iterations in the Studio to perfect the prompts. I fine-tuned phrases like "hyper-realistic, glowing crystal ball" and "sitting on a dark, mystical surface" to achieve the exact aesthetic I envisioned. This rapid prototyping saved hours of development time and ensured the final app produced consistently magical results.

Multimodal Features

The soul of Crystal Vision AI lies in its multimodal functionality.

Specifically, in the Enchant an Image mode.

This isn't just a simple image filter. It's a true creative conversation with the AI. The model processes two distinct types of information simultaneously:

  1. Visual Input: The user's uploaded image. The AI doesn't just see pixels; it gains a contextual understanding of the subject and composition of the photo.

  2. Textual Input: The user's typed command. This is where the user directs the magic, asking for specific changes like "add a wizard hat" or "make it look like it's made of stars."

The model then fuses these two inputs. It intelligently identifies the main subject from the image and applies the textual command to it, before reimagining the entire scene within the crystal ball theme.

Why does this enhance the user experience?

It makes the creation process deeply personal and interactive.

Users aren't just passive prompters; they are active collaborators with the AI. They can bring their own life and memories—their pets, their friends, their art—into the magical world.

This transforms the app from a simple generator into a powerful, personal creative companion. It's the profound difference between asking an AI to create a dragon, and asking it to give your beloved pet lizard a pair of majestic, fiery wings.

That is the magic of multimodality.
And that is the magic of Crystal Vision AI.

Top comments (0)