<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SS</title>
    <description>The latest articles on DEV Community by SS (@ssithub).</description>
    <link>https://dev.to/ssithub</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3470583%2F80ee9c8a-3aac-4601-99f8-c84580a0955b.png</url>
      <title>DEV Community: SS</title>
      <link>https://dev.to/ssithub</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ssithub"/>
    <language>en</language>
    <item>
      <title>Travigo</title>
      <dc:creator>SS</dc:creator>
      <pubDate>Mon, 16 Mar 2026 22:43:41 +0000</pubDate>
      <link>https://dev.to/ssithub/travigo-21j2</link>
      <guid>https://dev.to/ssithub/travigo-21j2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Travel as fast as you speak with Gemini ! Where live agents meet immersive storytelling &amp;amp; 3D navigation&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This project was created for entering the &lt;a href="https://geminiliveagentchallenge.devpost.com/" rel="noopener noreferrer"&gt;Gemini Live Agent Challenge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;#GeminiLiveAgentChallenge&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/_pK0ii2bbZg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;Travigo is a &lt;strong&gt;next-generation AI Agent&lt;/strong&gt; that utilizes multimodal inputs and outputs, moving far beyond simple text-in/text-out interactions. The project leverages Google's Gen AI SDK, Gemini Live API, Gemini 3, Google Maps API cloud services with the creative power of generative AI and spatial context to solve complex problems and create entirely new, immersive user experiences in 3D navigation and storytelling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features &amp;amp; Functionality
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal Interactions:&lt;/strong&gt; Communicate via voice and text while the AI processes real-time visual context from the interactive Street View and 3D map spatial data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Personas:&lt;/strong&gt; Choose between Concierge Mode (realistic local guides) and Game Mode (mystical/run-time personas) adapting tone and narrative focus on the fly. There are also local personas spinned up on the fly by Gemini 3 depending on the user AR location, street view.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Context Processing:&lt;/strong&gt; Uses a Live Agent Orchestrator to stream dialogue &amp;amp; voice directly tied to user actions and spatial events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immersive Storytelling:&lt;/strong&gt; Generates contextual narratives overlaid seamlessly onto the UI and 3D environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Models Used
&lt;/h3&gt;

&lt;p&gt;The project utilizes a multi-model architecture, leveraging different Gemini models google gen AI SDK depending on the task:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gemini-2.5-flash-native-audio-preview&lt;/strong&gt;: Used by the Live Agent Orchestrator to power real-time, multimodal conversations via voice and audio streaming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gemini-2.5-flash&lt;/strong&gt;: Used for rapid "Scout" queries, specifically grounding location searches using the Google Maps tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gemini-3.1-pro-preview&lt;/strong&gt;: Used for complex deep reasoning tasks via High Thinking levels, such as planning logistics, tours, generating fictional personas based on spatial context and performing deep "Strategic Analysis" (e.g., visa planning, historic deep dives) grounded by Google Search.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>geminiliveagentchallenge</category>
    </item>
    <item>
      <title>SONICS.ai 🎬 💥 🎞️ create character-consistent Comics that 'speak' your Style</title>
      <dc:creator>SS</dc:creator>
      <pubDate>Sun, 14 Sep 2025 14:58:40 +0000</pubDate>
      <link>https://dev.to/ssithub/sonicsai-create-comics-that-speak-your-style-32p8</link>
      <guid>https://dev.to/ssithub/sonicsai-create-comics-that-speak-your-style-32p8</guid>
      <description>&lt;p&gt;&lt;strong&gt;__&lt;/strong&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
 &lt;/p&gt;
&lt;h2&gt;
  
  
  💡 Inspiration
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt; &lt;br&gt;
&lt;em&gt;I always wanted to make comics that can capture my chaotic imaginations - but the drawing, erasing, starting again is&lt;/em&gt; 👀 &lt;em&gt;such a drag!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://imgflip.com/i/6x0sox" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imgflip.com%2F6x0sox.jpg" title="made at imgflip.com"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Also,&lt;/em&gt; &lt;strong&gt;&lt;em&gt;AI&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;didn't help much - create - frustrate - regenerate - repeat and yet couldn't get my&lt;/em&gt; &lt;strong&gt;&lt;em&gt;vibe 🌈&lt;/em&gt;&lt;/strong&gt; ... 👀 &lt;em&gt;even more drag!&lt;/em&gt;&lt;br&gt;
 &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Well that was until ✨ &lt;strong&gt;Gemini nano banana&lt;/strong&gt; &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;!&lt;/p&gt;

&lt;p&gt;I am so blown by its &lt;strong&gt;editing capabilities&lt;/strong&gt; specially working with &lt;strong&gt;multi-image, multi-modal inputs&lt;/strong&gt; &lt;em&gt;that I couldn't allow my lazy self to procrastinate anymore !&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So, here's (&lt;em&gt;quick links&lt;/em&gt;)&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🪄 &lt;em&gt;What I built&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🎥 &lt;strong&gt;&lt;em&gt;Demo Video&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🧩 &lt;em&gt;Multimodal app architecture&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;⚡ &lt;em&gt;How I Used Google AI Studio&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;✨ &lt;em&gt;Multimodal capabilities I implemented&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🚀 &lt;em&gt;Specific multimodal features I built for UX&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🎉 &lt;em&gt;Acknowledgement&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt; &lt;br&gt;
&lt;strong&gt;SONICS.ai&lt;/strong&gt; 🪄 is a &lt;strong&gt;Google-AI ✨ powered creative suite 🧠 🎬 📚 🎞️&lt;/strong&gt; that transforms user's simple &lt;strong&gt;idea&lt;/strong&gt; into a fully-realized, &lt;em&gt;multi-sensory&lt;/em&gt;, &lt;strong&gt;character-consistent comic&lt;/strong&gt; book experience with &lt;strong&gt;podcast playbacks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It allows users to add their&lt;/em&gt; &lt;strong&gt;&lt;em&gt;flavours/ vibes&lt;/em&gt;&lt;/strong&gt; 🌈 &lt;em&gt;to every&lt;/em&gt; &lt;strong&gt;&lt;em&gt;aspect&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;of comic creation - from storyline to characters to scenes to dialogues to text styles&lt;/em&gt; &lt;em&gt;- all in&lt;/em&gt; &lt;strong&gt;&lt;em&gt;natural language&lt;/em&gt;&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt; &lt;br&gt;
The best part? You dont need to be good at drawing! AI solves it for you in &lt;strong&gt;⚡ minutes&lt;/strong&gt; !&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You can bring your creativity to life without losing your patience with&lt;/em&gt; &lt;strong&gt;&lt;em&gt;back-n-forth regeneration&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;to get that perfect shot!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;br&gt;
You can use SONICS for a variety of use cases - from &lt;strong&gt;bedtime stories&lt;/strong&gt; &lt;code&gt;podcast&lt;/code&gt; to full &lt;strong&gt;production-ready&lt;/strong&gt; &lt;code&gt;comics&lt;/code&gt; with &lt;code&gt;playback&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Bring your stories to life - your style!&lt;/em&gt;&lt;/strong&gt; &lt;br&gt;
&lt;em&gt;Let your imagination go wild !&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;My project in action 🎥&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;em&gt;My project in action&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt; &lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/g7aeLYSg7sE"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;0:00 Intro&lt;br&gt;
0:10 🧠 Story Conception&lt;br&gt;
0:20 🎬 Character/ Cast Design&lt;br&gt;
0:53 🎞️ Comic Panel Creation&lt;br&gt;
1:24 📚 Comic preview&lt;br&gt;
1:34 🎧 Audio preview&lt;br&gt;
1:47 🎥 Play the Comic that speaks your Style&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;▶️ &lt;a href="https://www.youtube.com/watch?v=g7aeLYSg7sE" rel="noopener noreferrer"&gt;Play on Youtube&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note : Due to billing constraints, I couldnt deploy my app so this is the video demo 👆 showing my project in action.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;




&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt; &lt;br&gt;
This app was &lt;strong&gt;entirely built on Google AI studio ⚡vibe-coded&lt;/strong&gt; from scratch &lt;br&gt;
&lt;em&gt;👀 as you could have guessed by now for my lazy vibes !&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt; &lt;br&gt;
I started with a simple idea prompt and kept on adding features by &lt;strong&gt;guiding the AI through pain-points&lt;/strong&gt; I have faced when vibe-creating comics with my flavour.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;br&gt;
The Multimodal capabilities I implemented ... &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;Multimodal Capabilities&lt;/em&gt;
&lt;/h3&gt;

&lt;p&gt; &lt;/p&gt;

&lt;blockquote&gt;
&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;em&gt;Input&lt;/em&gt;    &lt;br&gt;&lt;br&gt;
&lt;/th&gt;
&lt;th&gt;
&lt;em&gt;Output&lt;/em&gt; &lt;br&gt;&lt;br&gt;
&lt;/th&gt;
&lt;th&gt;
&lt;em&gt;Models ✨&lt;/em&gt; &lt;br&gt;&lt;br&gt;
&lt;/th&gt;
&lt;th&gt;
&lt;em&gt;Features 🚀&lt;/em&gt; &lt;br&gt;&lt;br&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text  &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;Image &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; &lt;br&gt; &lt;br&gt;&lt;code&gt;imagen&lt;/code&gt; &lt;br&gt; &lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;For quality Character, Scene Background generation&lt;br&gt;&lt;br&gt; Text editor based updates&lt;/em&gt; &lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image + Text  &lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;Text &lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemini-2.5-flash&lt;/code&gt; &lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;Automatic character description updates for natural language based character edits&lt;/em&gt; &lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image (mask) + Image + Text &lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;Image &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits,  detail improvement&lt;/em&gt; &lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple Images + Text &lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;A composite image with rendered text&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;&lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality&lt;/em&gt; &lt;br&gt;&lt;br&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The specific Multimodal functionalities 🚀 I built and why it enhances the user experience 👤 (UX)...&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Composite scene panels 🎞️
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt; &lt;br&gt;
✨   &lt;code&gt;imagen&lt;/code&gt;   &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;   &lt;code&gt;gemini-2.5-flash&lt;/code&gt;&lt;br&gt;
 &lt;br&gt;
🚀   The comic panels are created through an &lt;strong&gt;intelligent composition logic&lt;/strong&gt; combining the multimodal capabilities of the models to create final panel images from the inputs - scene background, character images, scripts that were themsleves generated by using either of these.&lt;br&gt;
  &lt;br&gt;
👤   &lt;em&gt;This ensures&lt;/em&gt; &lt;strong&gt;&lt;em&gt;character consistency, dialogue accuracy&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;as well as&lt;/em&gt; &lt;strong&gt;&lt;em&gt;scene quality&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;across comic scenes&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt; &lt;/p&gt;

&lt;h3&gt;
  
  
  Flavour edits  🌈
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt; &lt;br&gt;
✨   &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt;  &lt;code&gt;gemini-2.5-flash&lt;/code&gt;&lt;br&gt;
 &lt;br&gt;
🚀   It is used for enabling &lt;strong&gt;precise surgical edits&lt;/strong&gt; of &lt;strong&gt;scenes, characters, dialogues, styles&lt;/strong&gt; leveraging &lt;strong&gt;masking&lt;/strong&gt;.&lt;br&gt;
Users can simply explain their edits in &lt;strong&gt;natural language&lt;/strong&gt; for feature changes (with / without masking).&lt;br&gt;
It also handles &lt;strong&gt;auto-updating&lt;/strong&gt; user edit requests for images which must reflect in their &lt;strong&gt;respective strategic texts&lt;/strong&gt; like character description to &lt;strong&gt;ensure further consistencies.&lt;/strong&gt;&lt;br&gt;
 &lt;br&gt;
👤   &lt;em&gt;This helps users&lt;/em&gt; &lt;strong&gt;&lt;em&gt;avoid regenerating back-and-forth&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;images from scratch which was really&lt;/em&gt; &lt;strong&gt;&lt;em&gt;frustrating&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;when we need to make a small style/ error correction. And users can add their&lt;/em&gt; &lt;strong&gt;&lt;em&gt;vibes/ flavours/ styles&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;to the scene in natural language without worrying about any inconsistency.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🎉
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Acknowledgement
&lt;/h2&gt;

&lt;p&gt; &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Google AI studio ⚡&lt;/strong&gt; is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs. &lt;br&gt;
&lt;em&gt;But as you could have guessed 👀 Parkinson's law took most time !&lt;/em&gt;&lt;br&gt;
 &lt;br&gt;
&lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; ✨ (Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience, and solve the back-and-forth regeneration &amp;amp; vibe-check problem for vibe-comic enthusiasts.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;imagen&lt;/code&gt; ✨ helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;gemini-2.5-flash&lt;/code&gt; ✨ has been used for prompt engineering for inputs to other models, for auto-updating descriptions and also for optimising the deliverables.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Thank you! &lt;br&gt;
It was a fun and great experience!&lt;/p&gt;

&lt;h2&gt;
  
  
  👀
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;&lt;del&gt;What&lt;/del&gt; Definitely Not a drag!&lt;/em&gt; &lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
