<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arunav Maitra</title>
    <description>The latest articles on DEV Community by Arunav Maitra (@arunavmaitra).</description>
    <link>https://dev.to/arunavmaitra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3499200%2Ff9267340-8573-41c1-9cf0-fce0f7052845.png</url>
      <title>DEV Community: Arunav Maitra</title>
      <link>https://dev.to/arunavmaitra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arunavmaitra"/>
    <language>en</language>
    <item>
      <title>Meet Persona-Portraits AI</title>
      <dc:creator>Arunav Maitra</dc:creator>
      <pubDate>Mon, 15 Sep 2025 06:46:00 +0000</pubDate>
      <link>https://dev.to/arunavmaitra/meet-persona-portraits-ai-1d4p</link>
      <guid>https://dev.to/arunavmaitra/meet-persona-portraits-ai-1d4p</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;&lt;em&gt;Persona-Portraits AI&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It’s a magical web experience that lets you become the hero of any story.&lt;/p&gt;

&lt;p&gt;Ever wondered what you'd look like as an astronaut gazing at Earth? Or a cyberpunk rebel in a neon-drenched city? Now, you don't have to wonder.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Persona-Portraits AI&lt;/em&gt; solves a fun, creative challenge:&lt;br&gt;
How can we reimagine ourselves in fantastical scenarios without complex editing software?&lt;/p&gt;

&lt;p&gt;My applet provides the answer.&lt;/p&gt;

&lt;p&gt;You simply upload your photo, pick a scene, and our AI assistant gets to work. It intelligently blends your face onto a new body, with new clothes, a new background, and a new attitude—all while keeping &lt;em&gt;you&lt;/em&gt; recognizable.&lt;/p&gt;

&lt;p&gt;It’s your personal digital costume designer and movie-set creator, all rolled into one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo: &lt;a href="https://ai.studio/apps/drive/17IaNGsm4RW3lA2ISjiCdvZWRbRwKooQs" rel="noopener noreferrer"&gt;visit here&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s a glimpse into the magic of &lt;strong&gt;Persona-Portraits AI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Imagine a sleek, animated interface with a swirling galaxy in the background.&lt;/p&gt;

&lt;p&gt;First, you're greeted by the bold, italic headline: &lt;em&gt;&lt;em&gt;Step Into Another World&lt;/em&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zngdkagdrms1zrq379c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zngdkagdrms1zrq379c.png" alt="Image descri ption" width="800" height="502"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You click the glowing upload area, select your best selfie, and watch as interactive scenario cards slide into view. Each card—from 'Executive Drive' to 'Cosmic Explorer'—shimmers with possibility.&lt;/p&gt;

&lt;p&gt;You tap on &lt;em&gt;&lt;em&gt;Enchanted Forest&lt;/em&gt;&lt;/em&gt;. The card glows with a vibrant purple border, confirming your choice.&lt;/p&gt;

&lt;p&gt;With a deep breath, you press the big, beautiful, pulsating button:&lt;br&gt;
&lt;strong&gt;"Transform My Photo"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5wfoi2da3yc94nnd3raf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5wfoi2da3yc94nnd3raf.png" alt="Image descri ption" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instantly, a mesmerizing loader appears, cycling through witty messages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Warming up the digital canvas..."&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Consulting with the art muses..."&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Almost there, adding the final touches..."&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67o4t5lyc67dvzysd686.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67o4t5lyc67dvzysd686.png" alt="Image descrip  tion" width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And then, it happens.&lt;/p&gt;

&lt;p&gt;A breathtaking image fades in. It's you, but reimagined. You're an elf, with ethereal robes, standing in a forest lit by glowing mushrooms. The likeness is uncanny.&lt;/p&gt;

&lt;p&gt;A stylish "Download Image" button appears, and with one click, your new persona is saved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujk5dnzo3lget9xxj1wv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujk5dnzo3lget9xxj1wv.png" alt="Ima scription" width="800" height="634"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the seamless, powerful, and utterly fun experience of &lt;em&gt;Persona-Portraits AI&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was the creative heart of this project.&lt;/p&gt;

&lt;p&gt;The entire application is powered by the phenomenal capabilities of the &lt;strong&gt;Gemini 2.5 Flash Image Preview&lt;/strong&gt; model, also known as &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This model is a wizard at understanding and editing images based on text commands.&lt;/p&gt;

&lt;p&gt;My process involved:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Prototyping Prompts&lt;/strong&gt;: I used Google AI Studio as a sandbox. I experimented with dozens of prompts to find the perfect phrasing. How do you ask an AI to change clothes but not a face? How do you describe a "cyberpunk" aesthetic? The studio gave me instant visual feedback.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Model Selection&lt;/strong&gt;: I specifically chose &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; for its incredible balance of speed and quality in image manipulation tasks.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;API Integration&lt;/strong&gt;: Once the prompts were perfected, I integrated the &lt;code&gt;@google/genai&lt;/code&gt; SDK into the app. The code directly calls the model with the user's image and the selected scenario's prompt, bringing the magic to life.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without the power and flexibility of the Gemini models, this applet would not have been possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Persona-Portraits AI&lt;/em&gt;&lt;/strong&gt; is multimodal at its very core. It thrives on the conversation between different types of data.&lt;/p&gt;

&lt;p&gt;Here’s the breakdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input 1 (Image):&lt;/strong&gt; The user uploads their photograph. This is the visual anchor, the &lt;em&gt;subject&lt;/em&gt; of our story.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Input 2 (Text):&lt;/strong&gt; The user selects a scenario, which corresponds to a carefully crafted prompt. This is the narrative instruction, the &lt;em&gt;plot&lt;/em&gt; of our story.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Gemini model doesn't just process these inputs one after the other. It understands them &lt;em&gt;together&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;It looks at your face in the photo and comprehends the instruction: &lt;em&gt;"Place this person in a luxury car... change their clothing to a business suit... keep the facial features identical."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This fusion of &lt;strong&gt;image and text understanding&lt;/strong&gt; is what creates a believable, high-quality result. It’s not a simple filter or a cut-and-paste job. It’s a contextual transformation.&lt;/p&gt;

&lt;p&gt;This multimodal approach enhances the user experience by offering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Limitless Creativity:&lt;/em&gt;&lt;/strong&gt; Any prompt can become a new reality.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Deep Personalization:&lt;/em&gt;&lt;/strong&gt; The final image is uniquely yours, not a generic template.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Simplicity:&lt;/em&gt;&lt;/strong&gt; Users don't need to be prompt engineers. They just pick a vibe, and the app handles the complex conversation with the AI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining what the user &lt;em&gt;looks like&lt;/em&gt; with what they &lt;em&gt;want to be&lt;/em&gt;, we create a truly magical and personal piece of art.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Try ChromaFlip Chronicles</title>
      <dc:creator>Arunav Maitra</dc:creator>
      <pubDate>Sun, 14 Sep 2025 18:37:00 +0000</pubDate>
      <link>https://dev.to/arunavmaitra/try-chromaflip-chronicles-1af6</link>
      <guid>https://dev.to/arunavmaitra/try-chromaflip-chronicles-1af6</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;ChromaFlip Chronicles&lt;/strong&gt;, a digital experience that breathes new life into the classic photo album.&lt;/p&gt;

&lt;p&gt;Imagine a scrapbook, but one that's alive.&lt;br&gt;
One that's interactive.&lt;br&gt;
And one that's powered by your own imagination.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;That's ChromaFlip Chronicles.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's a beautifully designed, hand-drawn style notebook that you can flip through, page by page. But here's the magic: it's not just a gallery. It's a creative canvas. Each page allows you to take a photo—a memory, a piece of art, a random snapshot—and completely &lt;strong&gt;remix&lt;/strong&gt; it using the power of generative AI.&lt;/p&gt;

&lt;p&gt;It solves a simple but profound problem: our digital photos often sit stagnant in folders. This applet turns passive viewing into an active, creative process, allowing anyone to become a digital artist and storyteller.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It’s your AI-powered visual diary, where memories are not just stored, but wonderfully reimagined.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Here's a look at the enchanting world of &lt;strong&gt;ChromaFlip Chronicles&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://ai.studio/apps/drive/1Gr1Jj_70iQRCzoamdmThkoiIQEzOV4Ln" rel="noopener noreferrer"&gt;A live demo link for you to try &lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Screenshots:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmx42jkgnc2278917vtvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmx42jkgnc2278917vtvb.png" alt="Image descri ption" width="800" height="838"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A glimpse of the main notebook interface, where users can navigate through their visual diary.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2q0dgis6eye71i0s2pm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo2q0dgis6eye71i0s2pm.png" alt="Image descrip tion" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubupmkezag4b14h63v1s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fubupmkezag4b14h63v1s.png" alt="Image dcription" width="800" height="574"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Here, you can see the intuitive controls for remixing an image with a simple text prompt.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was the creative engine behind this project. The star of the show is the &lt;strong&gt;Gemini 2.5 Flash Image Preview&lt;/strong&gt; model (also known as &lt;em&gt;nano-banana&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;My entire application is built around its unique multimodal capabilities.&lt;/p&gt;

&lt;p&gt;Here’s the technical breakdown:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The Request:&lt;/strong&gt; When a user wants to "remix" an image, I send a request to the Gemini API using the &lt;code&gt;@google/genai&lt;/code&gt; library.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Multimodal Input:&lt;/strong&gt; This isn't just a text prompt. The request is &lt;em&gt;multimodal&lt;/em&gt; because it sends two distinct pieces of information together:

&lt;ul&gt;
&lt;li&gt;  The user's existing &lt;strong&gt;image&lt;/strong&gt; (as a base64 encoded string).&lt;/li&gt;
&lt;li&gt;  The user's creative &lt;strong&gt;text prompt&lt;/strong&gt; (e.g., "make this black and white photo burst with color").&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Magic:&lt;/strong&gt; The &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; model understands how to interpret the text prompt as a set of instructions to &lt;em&gt;edit the provided image&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Response:&lt;/strong&gt; The model then sends back a brand new, AI-generated image, which my app seamlessly displays on the notebook page.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It was surprisingly simple to implement, yet incredibly powerful in its results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The core of &lt;strong&gt;ChromaFlip Chronicles&lt;/strong&gt; &lt;em&gt;is&lt;/em&gt; its multimodal functionality. It's not just a feature; it's the entire premise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does this enhance the user experience?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;It's Personal:&lt;/em&gt;&lt;/strong&gt; Instead of generating images from scratch, users start with their &lt;em&gt;own&lt;/em&gt; photos. This makes the creative process deeply personal and grounded in their own memories. You're not just creating art; you're transforming a piece of your own life.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;It's Intuitive:&lt;/em&gt;&lt;/strong&gt; The interaction is as simple as talking. You just &lt;em&gt;tell&lt;/em&gt; the AI what you want to change about your picture. This removes the barrier of complex photo editing software and opens up creative expression to everyone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;It's A Creative Partnership:&lt;/em&gt;&lt;/strong&gt; The multimodality—combining an image (what you &lt;em&gt;have&lt;/em&gt;) with a text prompt (what you &lt;em&gt;imagine&lt;/em&gt;)—creates a beautiful partnership between the user and the AI. It feels less like using a tool and more like collaborating with a creative partner who can instantly bring your ideas to life.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This fusion of image and text input is what makes &lt;strong&gt;ChromaFlip Chronicles&lt;/strong&gt; a truly magical and engaging experience.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>I built Element Fusion</title>
      <dc:creator>Arunav Maitra</dc:creator>
      <pubDate>Sun, 14 Sep 2025 16:46:00 +0000</pubDate>
      <link>https://dev.to/arunavmaitra/i-built-element-fusion-5ed6</link>
      <guid>https://dev.to/arunavmaitra/i-built-element-fusion-5ed6</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Ever had a wild, creative idea that was hard to put into words? &lt;/p&gt;

&lt;p&gt;Maybe you imagined a &lt;em&gt;cyberpunk cat&lt;/em&gt;, wearing &lt;em&gt;your favorite sunglasses&lt;/em&gt;, majestically riding a &lt;em&gt;cosmic whale&lt;/em&gt; through a &lt;em&gt;nebula made of donuts&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Trying to generate that with text alone can be a challenge. The AI might not get the &lt;em&gt;exact&lt;/em&gt; style of sunglasses or the specific look of the cat you envisioned.&lt;/p&gt;

&lt;p&gt;That's the problem I wanted to solve.&lt;/p&gt;

&lt;p&gt;So, I built &lt;strong&gt;Element Fusion&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Element Fusion isn't just another image generator. It's a visual alchemy engine. It's a creative playground where &lt;em&gt;you&lt;/em&gt; provide the core ingredients.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s the magic formula:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;&lt;em&gt;You upload the elements:&lt;/em&gt;&lt;/strong&gt; That specific cat, those exact sunglasses, a picture of a whale. These are your non-negotiable visual assets.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;em&gt;You describe the scene:&lt;/em&gt;&lt;/strong&gt; This is where you become the director. You write the prompt that ties everything together. "Create a photorealistic image of the cat wearing the sunglasses, riding the whale through a vibrant, swirling nebula..."&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;&lt;em&gt;Element Fusion creates:&lt;/em&gt;&lt;/strong&gt; The app uses the power of Gemini to intelligently understand and combine all your visual elements into one seamless, stunning, and often surprising new image.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's an applet built for artists, designers, meme-makers, and anyone who wants to bring their most complex visual daydreams to life.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Check out the live applet here:&lt;/strong&gt; &lt;a href="https://ai.studio/apps/drive/1dFx_T5fFuXzWq55pcyR-EyP8Afq2M1or" rel="noopener noreferrer"&gt;Link to your deployed applet&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s a glimpse into the creative process:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: The Canvas Awaits&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Our journey begins on a sleek, futuristic interface. The stage is set for your imagination to take flight.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozeydrnde3qeldaihp5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozeydrnde3qeldaihp5o.png" alt="Image ription" width="800" height="815"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiz3jdbblgmih8pi0zmef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiz3jdbblgmih8pi0zmef.png" alt="Image destion" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A placeholder image showing the app's beautiful hero section.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Assembling the Elements&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here, you upload your core visual components. For this masterpiece, we've chosen a noble cat, a futuristic city, and a classic car.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7kctwy1f6peprm00wb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc7kctwy1f6peprm00wb3.png" alt="Image descri  ption" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A placeholder image showing the file upload area populated with distinct images.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Directing the Vision&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With our elements in place, we write the prompt. This is our script for the AI, telling it precisely how to blend the images into a cohesive scene.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbax492cul7rpe3ld1di.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkbax492cul7rpe3ld1di.png" alt="Image dption" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: The Fusion!&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We hit the "Fuse Elements" button and watch the magic happen. Gemini gets to work, weaving our separate images into a single narrative. The result? A stunning, one-of-a-kind creation that was impossible to describe with words alone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"A majestic cat driving the vintage car down the neon-lit main street of the futuristic city at night. The style should be cinematic and photorealistic."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcq861seteco2u2q6tt7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkcq861seteco2u2q6tt7.png" alt="Image de  scription" width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A placeholder image of a breathtaking, AI-generated image that combines the uploaded elements.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio and the Gemini API are the heart and soul of this project.&lt;/p&gt;

&lt;p&gt;My workflow was centered around the incredible capabilities of the &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; model, also known as the "nano-banana" model. This model is exceptionally good at understanding and manipulating image data.&lt;/p&gt;

&lt;p&gt;Here's the technical breakdown:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prototyping in AI Studio:&lt;/strong&gt; Before writing a single line of code, I used Google AI Studio to test the core concept. I manually uploaded different combinations of images and wrote various text prompts to see how the model would respond. This was &lt;em&gt;crucial&lt;/em&gt; for understanding its strengths and limitations, and for refining the prompt engineering strategy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multimodal Requests:&lt;/strong&gt; The app's core function is sending a rich, multimodal request to the Gemini API using the &lt;code&gt;@google/genai&lt;/code&gt; SDK.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Each user-uploaded image is converted to a base64 string.&lt;/li&gt;
&lt;li&gt;  These are then formatted as individual &lt;code&gt;inlineData&lt;/code&gt; parts in the request payload.&lt;/li&gt;
&lt;li&gt;  The user's written description is added as a final &lt;code&gt;text&lt;/code&gt; part.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This means a single API call might contain &lt;em&gt;multiple images and one text prompt&lt;/em&gt;—a truly multimodal instruction set.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Parsing the Response:&lt;/strong&gt; The &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; model can return both a new image and a text description. My code is set up to parse the response, extract the new base64 image data to display it, and show any accompanying text from the model.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The multimodality here is deep and transformative for the creative process. This isn't just text-to-image; it's &lt;strong&gt;multi-image-and-text-to-image&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why is this a game-changer?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Ultimate Specificity:&lt;/strong&gt; It gives the user unprecedented control. Instead of vaguely describing "a cute dog," you can upload a picture of &lt;em&gt;your&lt;/em&gt; dog. The AI then works with that specific visual information, preserving the unique character, breed, and even the lighting from your original photo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Creative Cohesion:&lt;/strong&gt; The text prompt acts as the narrative glue. It tells the model &lt;em&gt;how&lt;/em&gt; to combine the provided visual elements. It sets the mood, the style, the action, and the environment. This synergy between the provided images (the &lt;em&gt;what&lt;/em&gt;) and the text prompt (the &lt;em&gt;how&lt;/em&gt;) allows for the creation of incredibly nuanced and personal images.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enhanced User Experience:&lt;/strong&gt; This approach transforms the user from a passive requester into an active co-creator. You are not just asking the AI to make something &lt;em&gt;for&lt;/em&gt; you; you are collaborating &lt;em&gt;with&lt;/em&gt; the AI, providing it with the key building blocks to assemble your vision. It feels less like a command and more like a creative partnership.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, &lt;strong&gt;Element Fusion&lt;/strong&gt; leverages multimodality to create a powerful tool that respects the user's specific visual assets while using AI to weave them into something entirely new and magical.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Crystal Vision AI</title>
      <dc:creator>Arunav Maitra</dc:creator>
      <pubDate>Sun, 14 Sep 2025 09:44:00 +0000</pubDate>
      <link>https://dev.to/arunavmaitra/crystal-vision-ai-50hg</link>
      <guid>https://dev.to/arunavmaitra/crystal-vision-ai-50hg</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;&lt;em&gt;Crystal Vision AI&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;My goal wasn't just to create another image generator.&lt;br&gt;
I wanted to build an &lt;em&gt;experience&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A magical portal where your ideas and photos are transformed into mystical works of art.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Crystal Vision AI&lt;/strong&gt; solves a simple problem: &lt;em&gt;How can we make AI art generation more personal and enchanting?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It does this in two powerful ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Enchant an Image:&lt;/em&gt;&lt;/strong&gt; You can upload your own photo—of your pet, a friend, or a favorite object. Then, you provide a text prompt to magically edit it. The AI understands &lt;em&gt;both&lt;/em&gt; the image and your words to create something entirely new.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Summon a Vision:&lt;/em&gt;&lt;/strong&gt; For moments of pure imagination, you can simply describe a scene. The AI acts as your personal oracle, conjuring a stunning, photorealistic image from your words alone.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The core magic?&lt;/p&gt;

&lt;p&gt;Every creation is beautifully and seamlessly encapsulated within a &lt;em&gt;glowing, hyper-realistic crystal ball&lt;/em&gt;, turning every generation into a unique, mystical artifact.&lt;/p&gt;

&lt;p&gt;It's a tool designed to spark joy, unleash creativity, and make you feel like a real magician.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Behold the magic in action!&lt;/p&gt;

&lt;p&gt;🔮 &lt;strong&gt;Live Applet Link:&lt;/strong&gt; &lt;strong&gt;&lt;a href="https://ai.studio/apps/drive/1ScUu76U8IldKSDF_NUBX-iIiWIL9fCkg" rel="noopener noreferrer"&gt;Experience Crystal Vision AI Here!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a glimpse into the visual journey:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Grand Welcome
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Users are greeted by an ethereal, animated interface that immediately sets a magical tone.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanmb7ye82x27bjx5lbfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanmb7ye82x27bjx5lbfk.png" alt="Image descri ption" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Enchanting a Personal Photo
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Here, a user has uploaded a photo of their cat and is adding a prompt to give it a sparkling crown. Notice the simple, intuitive controls.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fooacyrk8fpld90tcrhzb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fooacyrk8fpld90tcrhzb.png" alt="Image descri ption" width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuvju5b5ht5asz295d6h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuvju5b5ht5asz295d6h.png" alt="Image des cription" width="800" height="602"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Final Masterpiece
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;After a moment of 'consulting the oracle,' the final vision is revealed—a breathtaking image, perfectly rendered inside the crystal ball.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0rurtgoen985gctndc4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0rurtgoen985gctndc4.png" alt="Image descn ription" width="800" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyop15dh9zh6jfmhkig2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flyop15dh9zh6jfmhkig2.png" alt="Image descri ption" width="800" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was my digital alchemy lab. It was the crucial first step where I prototyped, tested, and truly understood the capabilities of the Gemini models before writing a single line of production code.&lt;/p&gt;

&lt;p&gt;My two key ingredients were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;&lt;/em&gt;&lt;/strong&gt;: This was the absolute &lt;em&gt;star of the show&lt;/em&gt;. Its powerful multimodal capabilities are the engine behind the "Enchant an Image" feature. I used the Studio to test how the model would interpret an uploaded image alongside a text prompt.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;code&gt;imagen-4.0-generate-001&lt;/code&gt;&lt;/em&gt;&lt;/strong&gt;: This model is a pure powerhouse for text-to-image generation. It's the oracle that powers the "Summon a Vision" feature, creating stunningly detailed images from just a description.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My process involved countless iterations in the Studio to perfect the prompts. I fine-tuned phrases like &lt;em&gt;"hyper-realistic, glowing crystal ball"&lt;/em&gt; and &lt;em&gt;"sitting on a dark, mystical surface"&lt;/em&gt; to achieve the exact aesthetic I envisioned. This rapid prototyping saved hours of development time and ensured the final app produced consistently magical results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The soul of &lt;strong&gt;Crystal Vision AI&lt;/strong&gt; lies in its multimodal functionality.&lt;/p&gt;

&lt;p&gt;Specifically, in the &lt;strong&gt;&lt;em&gt;Enchant an Image&lt;/em&gt;&lt;/strong&gt; mode.&lt;/p&gt;

&lt;p&gt;This isn't just a simple image filter. It's a true creative conversation with the AI. The model processes two distinct types of information simultaneously:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visual Input:&lt;/strong&gt; The user's uploaded image. The AI doesn't just see pixels; it gains a contextual understanding of the &lt;em&gt;subject&lt;/em&gt; and &lt;em&gt;composition&lt;/em&gt; of the photo.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Textual Input:&lt;/strong&gt; The user's typed command. This is where the user directs the magic, asking for specific changes like &lt;em&gt;"add a wizard hat"&lt;/em&gt; or &lt;em&gt;"make it look like it's made of stars."&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model then fuses these two inputs. It intelligently identifies the main subject from the image and applies the textual command to it, before reimagining the entire scene within the crystal ball theme.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why does this enhance the user experience?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It makes the creation process deeply &lt;strong&gt;personal and interactive&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Users aren't just passive prompters; they are active &lt;strong&gt;collaborators&lt;/strong&gt; with the AI. They can bring their own life and memories—their pets, their friends, their art—into the magical world.&lt;/p&gt;

&lt;p&gt;This transforms the app from a simple generator into a powerful, personal creative companion. It's the profound difference between asking an AI to &lt;em&gt;create a dragon&lt;/em&gt;, and asking it to give &lt;em&gt;your beloved pet lizard&lt;/em&gt; a pair of majestic, fiery wings.&lt;/p&gt;

&lt;p&gt;That is the magic of multimodality.&lt;br&gt;
And that is the magic of &lt;strong&gt;Crystal Vision AI&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>ArchiBlocks 3D</title>
      <dc:creator>Arunav Maitra</dc:creator>
      <pubDate>Sat, 13 Sep 2025 08:55:37 +0000</pubDate>
      <link>https://dev.to/arunavmaitra/archiblocks-3d-3ink</link>
      <guid>https://dev.to/arunavmaitra/archiblocks-3d-3ink</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;ArchiBlocks 3D&lt;/strong&gt;, a web application that magically transforms real-world architectural photos into captivating 3D block models.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Have you ever looked at a building and imagined what it would look like as a LEGO set, a clay model, or something straight out of a low-poly video game?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the core idea behind ArchiBlocks 3D. It bridges the gap between reality and imagination.&lt;/p&gt;

&lt;p&gt;It's a simple, intuitive tool for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Artists&lt;/em&gt;&lt;/strong&gt; seeking inspiration.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Designers&lt;/em&gt;&lt;/strong&gt; looking for a new way to visualize concepts.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;em&gt;Hobbyists&lt;/em&gt;&lt;/strong&gt; who just want to have fun and see the world differently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The app takes a user's uploaded image and a text prompt describing a desired style, and uses the power of Gemini to generate a brand new, stylized 3D version.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo : &lt;a href="https://ai.studio/apps/drive/1ZdCKf8bxuYT7JbH8s5xEL6HP0l47ojIC" rel="noopener noreferrer"&gt;Live Here or play with it&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Here’s a walkthrough of the experience. Imagine uploading a photo of the iconic Eiffel Tower...&lt;/p&gt;

&lt;p&gt;First, the user is greeted by a dynamic hero section with an animated background, setting a creative tone.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4y0sunef37tox6i3wcwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4y0sunef37tox6i3wcwb.png" alt="Image descr iption" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8cvpv98599l70dbnbq0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8cvpv98599l70dbnbq0.png" alt="Image descrip tion" width="800" height="768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, they scroll down to the generator. Here, they can drag-and-drop their architectural photo and type in a creative prompt. For this example, the prompt is: &lt;em&gt;“Isometric low-poly 3D model, vibrant colors.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwz40hfew91z2k6ptd6p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwz40hfew91z2k6ptd6p.png" alt="Image desc ription" width="429" height="579"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After hitting the &lt;strong&gt;✨ Generate 3D Model&lt;/strong&gt; button, the magic happens! The AI gets to work, and in a few moments, the result is displayed in a beautiful side-by-side comparison.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls620gtzuloxihsrsfal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls620gtzuloxihsrsfal.png" alt="Image descrip tion" width="441" height="584"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Please note: This applet was built using the &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; model. The screenshots and descriptions here showcase its full functionality in action!&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was the &lt;em&gt;engine&lt;/em&gt; behind this entire project. I specifically leveraged the &lt;strong&gt;Gemini API&lt;/strong&gt; and its powerful multimodal capabilities.&lt;/p&gt;

&lt;p&gt;My model of choice was &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; (also known as nanobanana), which is absolutely perfect for this kind of creative image editing task.&lt;/p&gt;

&lt;p&gt;The implementation is centered in my &lt;code&gt;services/geminiService.ts&lt;/code&gt; file. In it, I construct a &lt;code&gt;generateContent&lt;/code&gt; request that includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; An &lt;strong&gt;image part&lt;/strong&gt;, containing the user's uploaded photo as a base64 encoded string.&lt;/li&gt;
&lt;li&gt; A &lt;strong&gt;text part&lt;/strong&gt;, which combines my instructions with the user's unique style prompt.&lt;/li&gt;
&lt;li&gt; A &lt;strong&gt;config object&lt;/strong&gt; where I specify that the response should include both &lt;code&gt;Modality.IMAGE&lt;/code&gt; and &lt;code&gt;Modality.TEXT&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This setup allows Gemini to understand the visual context from the image and the stylistic direction from the text, merging them to produce a completely new piece of art.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The core of &lt;strong&gt;ArchiBlocks 3D&lt;/strong&gt; is its &lt;strong&gt;Image-and-Text-to-Image&lt;/strong&gt; generation. This is a truly multimodal feature that enhances the user experience in a profound way.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Input 1 (Image):&lt;/strong&gt; The user provides the &lt;em&gt;visual foundation&lt;/em&gt;—a photo of a building or landscape. This sets the scene and defines the subject.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Input 2 (Text):&lt;/strong&gt; The user provides the &lt;em&gt;creative direction&lt;/em&gt;—a prompt like &lt;em&gt;"a cute claymation model"&lt;/em&gt; or &lt;em&gt;"a futuristic neon render."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Output (Image):&lt;/strong&gt; The AI synthesizes both inputs to generate a new image that respects the structure of the original photo but completely reimagines it in the user's desired style.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why is this so powerful?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because it gives the user &lt;em&gt;agency and control&lt;/em&gt;. Instead of a one-size-fits-all filter, it opens up an infinite canvas of possibilities. The user isn't just a passive observer; they are an active collaborator with the AI, co-creating a unique visual masterpiece. &lt;/p&gt;

&lt;p&gt;This direct, creative dialogue between the user, their photo, and the AI is what makes ArchiBlocks 3D so engaging and fun.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>BrickVerse AI</title>
      <dc:creator>Arunav Maitra</dc:creator>
      <pubDate>Sat, 13 Sep 2025 08:36:27 +0000</pubDate>
      <link>https://dev.to/arunavmaitra/brickverse-ai-28g</link>
      <guid>https://dev.to/arunavmaitra/brickverse-ai-28g</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;BrickVerse AI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It's a magical portal where imagination meets digital creation.&lt;/p&gt;

&lt;p&gt;This applet solves a simple, yet wonderful problem: &lt;em&gt;How do you visualize any city in the world as a vibrant, intricate LEGO masterpiece?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;BrickVerse AI creates a delightful experience by allowing anyone, regardless of artistic skill, to become a master LEGO builder.&lt;/p&gt;

&lt;p&gt;You can start with just a simple &lt;strong&gt;city name&lt;/strong&gt;. &lt;br&gt;
&lt;em&gt;Type "Paris"...&lt;/em&gt; and watch the Eiffel Tower rise, brick by brick.&lt;/p&gt;

&lt;p&gt;Or, you can upload a &lt;strong&gt;personal photo&lt;/strong&gt;. &lt;br&gt;
&lt;em&gt;A snapshot from your last vacation...&lt;/em&gt; and see it completely reimagined as a bustling LEGO world.&lt;/p&gt;

&lt;p&gt;The goal is to spark creativity and bring a sense of childlike wonder to digital art, powered by the incredible capabilities of generative AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;You can experience the magic live right here:&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://ai.studio/apps/drive/1XIywn3Z_-SwIaZrUAIjfbaXnoriFYorN" rel="noopener noreferrer"&gt;Link to Deployed Applet&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s a little sneak peek into the world of BrickVerse AI!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;1. The Sleek &amp;amp; Simple Interface:&lt;/strong&gt; &lt;em&gt;Choose your creative path - text or image.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbygx0tjaunsefb0w0ca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbygx0tjaunsefb0w0ca.png" alt="Image descripti on" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;2. Generating from Text:&lt;/strong&gt; &lt;em&gt;We typed "Tokyo" and the AI started building...&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75rau7ghn8mdvc6xbwpa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75rau7ghn8mdvc6xbwpa.png" alt="Image descr iption" width="800" height="547"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;3. The Final Masterpiece:&lt;/strong&gt; &lt;em&gt;A stunning, photorealistic LEGO Tokyo, complete with cherry blossoms and iconic towers.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuinxxzjzzb56n7shpb1j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuinxxzjzzb56n7shpb1j.png" alt="Image descri ption" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was my digital workshop for this project. It was the perfect environment to explore, prototype, and harness the power of Google's latest multimodal models.&lt;/p&gt;

&lt;p&gt;I primarily leveraged two phenomenal models:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;imagen-4.0-generate-001&lt;/em&gt;&lt;/strong&gt;: For the text-to-image generation. I used AI Studio to fine-tune my prompts, experimenting with different keywords like &lt;em&gt;"photorealistic"&lt;/em&gt;, &lt;em&gt;"cinematic lighting"&lt;/em&gt;, and &lt;em&gt;"bustling with LEGO pedestrians"&lt;/em&gt; to achieve that perfect, lively LEGO aesthetic. The ability to quickly iterate in the studio was invaluable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;gemini-2.5-flash-image-preview&lt;/em&gt;&lt;/strong&gt;: This model is the heart of the image-to-image feature. My entire prompt engineering for transforming an existing photo into a LEGO world was done within AI Studio. I crafted instructions that guided the model to &lt;em&gt;recreate&lt;/em&gt;, not just overlay, the source image, ensuring every building, tree, and car was reimagined in brick form.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AI Studio made the process of integrating these powerful AI capabilities seamless and intuitive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The true essence of BrickVerse AI lies in its multimodal nature. It's not just about one input; it's about offering creative flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. From &lt;em&gt;Words&lt;/em&gt; to Worlds (Text-to-Image)
&lt;/h3&gt;

&lt;p&gt;This feature allows users to conjure a world from pure imagination.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;How it works:&lt;/strong&gt; A user provides a text string (e.g., "New York City"). The application then embeds this into a more detailed prompt and sends it to the &lt;code&gt;imagen-4.0-generate-001&lt;/code&gt; model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it enhances the experience:&lt;/strong&gt; It's the ultimate creative sandbox. You don't need a reference; you just need an idea. It makes the creation process incredibly accessible and limitless. You can dream of a LEGO Venice during a flood or a futuristic LEGO Dubai, and the AI will build it for you.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. From &lt;em&gt;Pixels&lt;/em&gt; to Plastic (Image-to-Image)
&lt;/h3&gt;

&lt;p&gt;This feature makes the creation process deeply personal.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;How it works:&lt;/strong&gt; A user uploads an image. The app sends this image data along with a text prompt (e.g., "Transform this entire image into a vibrant, detailed LEGO city scene") to the &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it enhances the experience:&lt;/strong&gt; This is where the magic becomes personal. Users can upload photos of their own hometown, a favorite landmark, or a cherished vacation spot. The AI doesn't just add a filter; it &lt;em&gt;understands&lt;/em&gt; the context of the image and rebuilds it. Seeing a personal memory transformed into a work of LEGO art creates a powerful and engaging emotional connection for the user.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining both text and image inputs, BrickVerse AI caters to different creative impulses, making it a truly versatile and captivating multimodal application.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
