DEV Community

Cover image for Guardian AI: Building a Real-Time Personal Safety App with Google Gemini Live API
Rahul Gurunule
Rahul Gurunule

Posted on

Guardian AI: Building a Real-Time Personal Safety App with Google Gemini Live API

Disclosure: This article and the Guardian AI project described herein were created for the purposes of entering the Google The Gemini Live Agent Challenge hackathon.


The Problem I Wanted to Solve

I was walking home late one evening when I realized something: my phone — capable of recording video, capturing audio, and connecting to the internet — couldn't help me in real time. It could only call for help after something happened.

That's when I thought: What if my phone could see what I see, hear what I hear, and warn me before danger strikes?

Personal safety apps exist, but they're all reactive:

  • Panic buttons require you to recognize danger AND remember to press a button
  • Location sharing only helps after an incident
  • Check-in apps require you to stay engaged

None of them are truly proactive. They don't watch. They don't listen. They don't speak.

I wanted to build something different: an AI companion that's always paying attention, understands context, and can speak to you in a calm, natural voice the moment something feels wrong.


Introducing Guardian AI

Guardian AI is a real-time personal safety companion that uses your phone's camera and microphone to continuously monitor your surroundings and alert you — or your emergency contacts — the moment danger is detected.

The app uses Google's Gemini 2.5 Flash Native Audio model via the Live Bidirectional API to process live video frames and audio simultaneously, assess environmental risk in real time, and respond with natural spoken guidance.

No typing. No tapping. Just a calm, intelligent voice keeping you safe.


Why This Matters Now

Safety concerns are growing:

  • Women report feeling unsafe in public spaces
  • Travelers face unfamiliar environments
  • Vulnerable populations need extra protection
  • Traditional panic buttons are outdated

But AI has evolved. We now have models that can see, hear, and speak in real time. We have the technology to build something genuinely protective.

Guardian AI addresses three real problems:

  1. Delayed response — Traditional panic buttons require the user to act. Guardian AI acts for you.
  2. Situational blindness — You can't always see what's behind you or around a corner. The AI can.
  3. Isolation in emergencies — When you're scared, you may not be able to call for help. Guardian AI calls for you.

Key Technical Accomplishments

First-of-Its-Kind Real-Time Multimodal Safety App

Guardian AI is the first consumer safety application to leverage Gemini Live API's bidirectional audio/video streaming. We achieved true real-time processing where camera frames, microphone input, and AI analysis happen simultaneously — with spoken responses delivered in real time.

Solved Complex Mobile Audio Engineering

Building real-time audio streaming on mobile browsers presented three major challenges:

  1. AudioContext Suspension — Mobile browsers suspend AudioContext unless created inside a user gesture. Solution: Create both input and output contexts inside the WebSocket connection handler.

  2. PCM16 Encoding — Gemini's Live API expects raw PCM16 audio at 16kHz, but Web Audio API provides Float32 samples. We implemented efficient conversion and base64 encoding in 8KB chunks.

  3. Conversation Flow — Gemini's BidiGenerateContent doesn't auto-respond to silence. We built a smart conversation pulse: check every 10 seconds, send an automated prompt only after 30 seconds of silence.

Structured Data from Conversational AI

We engineered Gemini to output both natural speech AND structured metadata tags simultaneously. The system embeds tags like [Lighting:Well-lit][Crowds:Empty][Behavior:Normal][Risk:20] in every response. The frontend parses these tags to drive color-coded UI indicators, then strips them before display.

Production-Grade Full-Stack Architecture

  • Frontend: React 19 + TypeScript + Vite (optimized for mobile)
  • Backend: Node.js relay server on Google Cloud Run (stateless, auto-scaling)
  • AI: Gemini 2.5 Flash Native Audio via Live API (bidirectional WebSocket)
  • Notifications: Twilio SMS + SendGrid email with GPS coordinates
  • Infrastructure: Fully containerized with Docker, deployed via GitHub Actions + Workload Identity Federation

Keyless Secure Deployment

Implemented Workload Identity Federation for GitHub Actions → GCP authentication with zero service account keys. The CI/CD pipeline automatically builds Docker images, pushes to Artifact Registry, and deploys to Cloud Run — all without storing credentials in GitHub.


The Technical Stack

Layer Technology
Frontend React 19, TypeScript, Vite, Tailwind CSS
Backend Node.js, Express 5, TypeScript
AI Model Gemini 2.5 Flash Native Audio (Live API)
Real-time WebSocket, Web Audio API, WebRTC
Notifications Twilio SMS, Twilio Email (SendGrid)
Backend Hosting Google Cloud Run
Frontend Hosting Firebase Hosting
Container Registry Google Artifact Registry
CI/CD GitHub Actions with Workload Identity Federation

Why Google Cloud?

Guardian AI requires Google Cloud because:

  1. Gemini Live API is exclusive — Only available on Google Cloud Platform
  2. Native audio bidirectional streaming — Unique to Gemini, not available on AWS/Azure
  3. Cloud Run WebSocket support — Essential for real-time relay architecture
  4. Seamless integration — Gemini API, Cloud Run, Artifact Registry, Firebase all work together
  5. Cost efficiency — Cloud Run scales to zero, Firebase Hosting is free tier friendly

What Makes Guardian AI Different

Feature Traditional Safety Apps Guardian AI
Activation Manual button press Always-on, automatic
Awareness None Real-time video + audio analysis
Response Sends location Speaks guidance, sends location + context
AI None Gemini 2.5 Flash Native Audio
Interaction Tap-based Fully voice-driven
Environmental data None Lighting, crowds, behavior indicators
Emergency alerts Location only Location + risk score + environmental factors

Lessons Learned

1. Native audio models change everything
Gemini's native audio output is dramatically more natural than text-to-speech. In a safety context, a calm, natural voice is reassuring. A robotic TTS voice is not.

2. Mobile audio is hard
The AudioContext suspended state issue cost me two days of debugging. The fix (create inside user gesture) is simple once you know it, but the debugging path is not obvious.

3. Structured output from conversational models
Getting a conversational model to reliably output structured tags alongside natural speech required careful prompt engineering. The key was providing exact format examples, not just descriptions.

4. Graceful degradation matters
Building the notification system to silently skip when Twilio isn't configured meant the app works perfectly in demo mode without any setup. This made testing and sharing much easier.


What's Next

  • Background mode — keep monitoring when the screen is off
  • Wearable integration — Apple Watch / WearOS for discreet alerts
  • Trusted contacts — share live location with family during monitoring sessions
  • Incident history — review past sessions with AI-generated summaries
  • Offline fallback — local risk assessment when connectivity drops

Try It Out

GitHub Repository: https://github.com/rahulgurunule/Guardian_AI


Key Takeaways

Real-time multimodal AI is now possible — Gemini Live API makes it accessible

Mobile audio engineering is solvable — With the right approach and patience

Safety tech can be proactive, not reactive — AI can watch and warn before danger strikes

Google Cloud is the right choice for Gemini — Seamless integration, cost-efficient, secure

Full-stack TypeScript is powerful — Same language across frontend, backend, and infrastructure


Final Thoughts

Building Guardian AI taught me that the best technology solutions start with a real problem. I didn't start by asking "how can I use Gemini Live API?" — I started by asking "how can I make people safer?"

The technology was the answer, not the starting point.

If you're interested in AI, real-time systems, or personal safety tech, I'd love to hear your thoughts. Feel free to reach out or check out the GitHub repo.


This project was created as a submission to the Google The Gemini Live Agent Challenge hackathon.


👥 Creators

This project was built by:


GoogleCloud #Gemini #AI #Hackathon #WebDevelopment #PersonalSafety #TechInnovation #FullStack #TypeScript

Top comments (0)