<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rahul Gurunule</title>
    <description>The latest articles on DEV Community by Rahul Gurunule (@rahul_gurunule).</description>
    <link>https://dev.to/rahul_gurunule</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3825296%2F41d751d1-b055-45e2-8751-60fe961801d9.jpg</url>
      <title>DEV Community: Rahul Gurunule</title>
      <link>https://dev.to/rahul_gurunule</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rahul_gurunule"/>
    <language>en</language>
    <item>
      <title>Guardian AI: Building a Real-Time Personal Safety App with Google Gemini Live API</title>
      <dc:creator>Rahul Gurunule</dc:creator>
      <pubDate>Sun, 15 Mar 2026 12:35:19 +0000</pubDate>
      <link>https://dev.to/rahul_gurunule/guardian-ai-building-a-real-time-personal-safety-app-with-google-gemini-live-api-gd0</link>
      <guid>https://dev.to/rahul_gurunule/guardian-ai-building-a-real-time-personal-safety-app-with-google-gemini-live-api-gd0</guid>
      <description>&lt;p&gt;&lt;strong&gt;Disclosure:&lt;/strong&gt; This article and the Guardian AI project described herein were created for the purposes of entering the &lt;strong&gt;Google The Gemini Live Agent Challenge&lt;/strong&gt; hackathon.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem I Wanted to Solve
&lt;/h2&gt;

&lt;p&gt;I was walking home late one evening when I realized something: my phone — capable of recording video, capturing audio, and connecting to the internet — couldn't help me in real time. It could only call for help &lt;em&gt;after&lt;/em&gt; something happened.&lt;/p&gt;

&lt;p&gt;That's when I thought: &lt;strong&gt;What if my phone could see what I see, hear what I hear, and warn me before danger strikes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Personal safety apps exist, but they're all reactive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Panic buttons&lt;/strong&gt; require you to recognize danger AND remember to press a button&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Location sharing&lt;/strong&gt; only helps after an incident&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check-in apps&lt;/strong&gt; require you to stay engaged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of them are truly &lt;em&gt;proactive&lt;/em&gt;. They don't watch. They don't listen. They don't speak.&lt;/p&gt;

&lt;p&gt;I wanted to build something different: an AI companion that's always paying attention, understands context, and can speak to you in a calm, natural voice the moment something feels wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing Guardian AI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Guardian AI&lt;/strong&gt; is a real-time personal safety companion that uses your phone's camera and microphone to continuously monitor your surroundings and alert you — or your emergency contacts — the moment danger is detected.&lt;/p&gt;

&lt;p&gt;The app uses &lt;strong&gt;Google's Gemini 2.5 Flash Native Audio model&lt;/strong&gt; via the &lt;strong&gt;Live Bidirectional API&lt;/strong&gt; to process live video frames and audio simultaneously, assess environmental risk in real time, and respond with natural spoken guidance.&lt;/p&gt;

&lt;p&gt;No typing. No tapping. Just a calm, intelligent voice keeping you safe.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Now
&lt;/h2&gt;

&lt;p&gt;Safety concerns are growing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Women report feeling unsafe in public spaces&lt;/li&gt;
&lt;li&gt;Travelers face unfamiliar environments&lt;/li&gt;
&lt;li&gt;Vulnerable populations need extra protection&lt;/li&gt;
&lt;li&gt;Traditional panic buttons are outdated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But AI has evolved. We now have models that can see, hear, and speak in real time. We have the technology to build something genuinely protective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guardian AI addresses three real problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Delayed response&lt;/strong&gt; — Traditional panic buttons require the user to act. Guardian AI acts for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Situational blindness&lt;/strong&gt; — You can't always see what's behind you or around a corner. The AI can.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Isolation in emergencies&lt;/strong&gt; — When you're scared, you may not be able to call for help. Guardian AI calls for you.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Key Technical Accomplishments
&lt;/h2&gt;

&lt;h3&gt;
  
  
  First-of-Its-Kind Real-Time Multimodal Safety App
&lt;/h3&gt;

&lt;p&gt;Guardian AI is the first consumer safety application to leverage &lt;strong&gt;Gemini Live API's bidirectional audio/video streaming&lt;/strong&gt;. We achieved true real-time processing where camera frames, microphone input, and AI analysis happen simultaneously — with spoken responses delivered in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solved Complex Mobile Audio Engineering
&lt;/h3&gt;

&lt;p&gt;Building real-time audio streaming on mobile browsers presented three major challenges:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AudioContext Suspension&lt;/strong&gt; — Mobile browsers suspend AudioContext unless created inside a user gesture. Solution: Create both input and output contexts inside the WebSocket connection handler.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;PCM16 Encoding&lt;/strong&gt; — Gemini's Live API expects raw PCM16 audio at 16kHz, but Web Audio API provides Float32 samples. We implemented efficient conversion and base64 encoding in 8KB chunks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conversation Flow&lt;/strong&gt; — Gemini's BidiGenerateContent doesn't auto-respond to silence. We built a smart conversation pulse: check every 10 seconds, send an automated prompt only after 30 seconds of silence.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Structured Data from Conversational AI
&lt;/h3&gt;

&lt;p&gt;We engineered Gemini to output both natural speech AND structured metadata tags simultaneously. The system embeds tags like &lt;code&gt;[Lighting:Well-lit][Crowds:Empty][Behavior:Normal][Risk:20]&lt;/code&gt; in every response. The frontend parses these tags to drive color-coded UI indicators, then strips them before display.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production-Grade Full-Stack Architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: React 19 + TypeScript + Vite (optimized for mobile)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: Node.js relay server on Google Cloud Run (stateless, auto-scaling)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI&lt;/strong&gt;: Gemini 2.5 Flash Native Audio via Live API (bidirectional WebSocket)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notifications&lt;/strong&gt;: Twilio SMS + SendGrid email with GPS coordinates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Fully containerized with Docker, deployed via GitHub Actions + Workload Identity Federation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Keyless Secure Deployment
&lt;/h3&gt;

&lt;p&gt;Implemented &lt;strong&gt;Workload Identity Federation&lt;/strong&gt; for GitHub Actions → GCP authentication with zero service account keys. The CI/CD pipeline automatically builds Docker images, pushes to Artifact Registry, and deploys to Cloud Run — all without storing credentials in GitHub.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Technical Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;React 19, TypeScript, Vite, Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js, Express 5, TypeScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash Native Audio (Live API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;WebSocket, Web Audio API, WebRTC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Notifications&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Twilio SMS, Twilio Email (SendGrid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend Hosting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend Hosting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Firebase Hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Container Registry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Artifact Registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GitHub Actions with Workload Identity Federation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Why Google Cloud?
&lt;/h2&gt;

&lt;p&gt;Guardian AI &lt;strong&gt;requires&lt;/strong&gt; Google Cloud because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Live API is exclusive&lt;/strong&gt; — Only available on Google Cloud Platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native audio bidirectional streaming&lt;/strong&gt; — Unique to Gemini, not available on AWS/Azure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Run WebSocket support&lt;/strong&gt; — Essential for real-time relay architecture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless integration&lt;/strong&gt; — Gemini API, Cloud Run, Artifact Registry, Firebase all work together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost efficiency&lt;/strong&gt; — Cloud Run scales to zero, Firebase Hosting is free tier friendly&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What Makes Guardian AI Different
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Traditional Safety Apps&lt;/th&gt;
&lt;th&gt;Guardian AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Activation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual button press&lt;/td&gt;
&lt;td&gt;Always-on, automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Awareness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Real-time video + audio analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Response&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sends location&lt;/td&gt;
&lt;td&gt;Speaks guidance, sends location + context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash Native Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tap-based&lt;/td&gt;
&lt;td&gt;Fully voice-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Environmental data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Lighting, crowds, behavior indicators&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Emergency alerts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Location only&lt;/td&gt;
&lt;td&gt;Location + risk score + environmental factors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Native audio models change everything&lt;/strong&gt;&lt;br&gt;
Gemini's native audio output is dramatically more natural than text-to-speech. In a safety context, a calm, natural voice is reassuring. A robotic TTS voice is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Mobile audio is hard&lt;/strong&gt;&lt;br&gt;
The AudioContext suspended state issue cost me two days of debugging. The fix (create inside user gesture) is simple once you know it, but the debugging path is not obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Structured output from conversational models&lt;/strong&gt;&lt;br&gt;
Getting a conversational model to reliably output structured tags alongside natural speech required careful prompt engineering. The key was providing exact format examples, not just descriptions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Graceful degradation matters&lt;/strong&gt;&lt;br&gt;
Building the notification system to silently skip when Twilio isn't configured meant the app works perfectly in demo mode without any setup. This made testing and sharing much easier.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Background mode&lt;/strong&gt; — keep monitoring when the screen is off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wearable integration&lt;/strong&gt; — Apple Watch / WearOS for discreet alerts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trusted contacts&lt;/strong&gt; — share live location with family during monitoring sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incident history&lt;/strong&gt; — review past sessions with AI-generated summaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline fallback&lt;/strong&gt; — local risk assessment when connectivity drops&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/rahulgurunule/Guardian_AI" rel="noopener noreferrer"&gt;https://github.com/rahulgurunule/Guardian_AI&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Real-time multimodal AI is now possible&lt;/strong&gt; — Gemini Live API makes it accessible&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Mobile audio engineering is solvable&lt;/strong&gt; — With the right approach and patience&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Safety tech can be proactive, not reactive&lt;/strong&gt; — AI can watch and warn before danger strikes&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Google Cloud is the right choice for Gemini&lt;/strong&gt; — Seamless integration, cost-efficient, secure&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Full-stack TypeScript is powerful&lt;/strong&gt; — Same language across frontend, backend, and infrastructure  &lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building Guardian AI taught me that the best technology solutions start with a real problem. I didn't start by asking "how can I use Gemini Live API?" — I started by asking "how can I make people safer?"&lt;/p&gt;

&lt;p&gt;The technology was the answer, not the starting point.&lt;/p&gt;

&lt;p&gt;If you're interested in AI, real-time systems, or personal safety tech, I'd love to hear your thoughts. Feel free to reach out or check out the GitHub repo.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;This project was created as a submission to the Google The Gemini Live Agent Challenge hackathon.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  👥 Creators
&lt;/h2&gt;

&lt;p&gt;This project was built by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rahul Gurunule&lt;/strong&gt; — &lt;a href="https://www.linkedin.com/in/rahul-g-4373b220/" rel="noopener noreferrer"&gt;LinkedIn Profile&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sushma Gurunule&lt;/strong&gt; — &lt;a href="https://www.linkedin.com/in/sushma-g-6069b560/" rel="noopener noreferrer"&gt;LinkedIn Profile&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  GoogleCloud #Gemini #AI #Hackathon #WebDevelopment #PersonalSafety #TechInnovation #FullStack #TypeScript
&lt;/h1&gt;

</description>
      <category>googlecloud</category>
      <category>hackathon</category>
      <category>personalsafety</category>
      <category>geminiliveagentchallenge</category>
    </item>
  </channel>
</rss>
