Rahul Gurunule

Posted on Mar 15 • Edited on Mar 16

Guardian AI: Building a Real-Time Personal Safety App with Google Gemini Live API

#googlecloud #hackathon #personalsafety #geminiliveagentchallenge

Disclosure: This article and the Guardian AI project described herein were created for the purposes of entering the Google The Gemini Live Agent Challenge hackathon.

The Problem I Wanted to Solve

I was walking home late one evening when I realized something: my phone — capable of recording video, capturing audio, and connecting to the internet — couldn't help me in real time. It could only call for help after something happened.

That's when I thought: What if my phone could see what I see, hear what I hear, and warn me before danger strikes?

Personal safety apps exist, but they're all reactive:

Panic buttons require you to recognize danger AND remember to press a button
Location sharing only helps after an incident
Check-in apps require you to stay engaged

None of them are truly proactive. They don't watch. They don't listen. They don't speak.

I wanted to build something different: an AI companion that's always paying attention, understands context, and can speak to you in a calm, natural voice the moment something feels wrong.

Introducing Guardian AI

Guardian AI is a real-time personal safety companion that uses your phone's camera and microphone to continuously monitor your surroundings and alert you — or your emergency contacts — the moment danger is detected.

The app uses Google's Gemini 2.5 Flash Native Audio model via the Live Bidirectional API to process live video frames and audio simultaneously, assess environmental risk in real time, and respond with natural spoken guidance.

No typing. No tapping. Just a calm, intelligent voice keeping you safe.

Why This Matters Now

Safety concerns are growing:

Women report feeling unsafe in public spaces
Travelers face unfamiliar environments
Vulnerable populations need extra protection
Traditional panic buttons are outdated

But AI has evolved. We now have models that can see, hear, and speak in real time. We have the technology to build something genuinely protective.

Guardian AI addresses three real problems:

Delayed response — Traditional panic buttons require the user to act. Guardian AI acts for you.
Situational blindness — You can't always see what's behind you or around a corner. The AI can.
Isolation in emergencies — When you're scared, you may not be able to call for help. Guardian AI calls for you.

Key Technical Accomplishments

First-of-Its-Kind Real-Time Multimodal Safety App

Guardian AI is the first consumer safety application to leverage Gemini Live API's bidirectional audio/video streaming. We achieved true real-time processing where camera frames, microphone input, and AI analysis happen simultaneously — with spoken responses delivered in real time.

Solved Complex Mobile Audio Engineering

Building real-time audio streaming on mobile browsers presented three major challenges:

AudioContext Suspension — Mobile browsers suspend AudioContext unless created inside a user gesture. Solution: Create both input and output contexts inside the WebSocket connection handler.
PCM16 Encoding — Gemini's Live API expects raw PCM16 audio at 16kHz, but Web Audio API provides Float32 samples. We implemented efficient conversion and base64 encoding in 8KB chunks.
Conversation Flow — Gemini's BidiGenerateContent doesn't auto-respond to silence. We built a smart conversation pulse: check every 10 seconds, send an automated prompt only after 30 seconds of silence.

Structured Data from Conversational AI

We engineered Gemini to output both natural speech AND structured metadata tags simultaneously. The system embeds tags like [Lighting:Well-lit][Crowds:Empty][Behavior:Normal][Risk:20] in every response. The frontend parses these tags to drive color-coded UI indicators, then strips them before display.

Production-Grade Full-Stack Architecture

Frontend: React 19 + TypeScript + Vite (optimized for mobile)
Backend: Node.js relay server on Google Cloud Run (stateless, auto-scaling)
AI: Gemini 2.5 Flash Native Audio via Live API (bidirectional WebSocket)
Notifications: Twilio SMS + SendGrid email with GPS coordinates
Infrastructure: Fully containerized with Docker, deployed via GitHub Actions + Workload Identity Federation

Keyless Secure Deployment

Implemented Workload Identity Federation for GitHub Actions → GCP authentication with zero service account keys. The CI/CD pipeline automatically builds Docker images, pushes to Artifact Registry, and deploys to Cloud Run — all without storing credentials in GitHub.

The Technical Stack

Layer	Technology
Frontend	React 19, TypeScript, Vite, Tailwind CSS
Backend	Node.js, Express 5, TypeScript
AI Model	Gemini 2.5 Flash Native Audio (Live API)
Real-time	WebSocket, Web Audio API, WebRTC
Notifications	Twilio SMS, Twilio Email (SendGrid)
Backend Hosting	Google Cloud Run
Frontend Hosting	Firebase Hosting
Container Registry	Google Artifact Registry
CI/CD	GitHub Actions with Workload Identity Federation

Why Google Cloud?

Guardian AI requires Google Cloud because:

Gemini Live API is exclusive — Only available on Google Cloud Platform
Native audio bidirectional streaming — Unique to Gemini, not available on AWS/Azure
Cloud Run WebSocket support — Essential for real-time relay architecture
Seamless integration — Gemini API, Cloud Run, Artifact Registry, Firebase all work together
Cost efficiency — Cloud Run scales to zero, Firebase Hosting is free tier friendly

What Makes Guardian AI Different

Feature	Traditional Safety Apps	Guardian AI
Activation	Manual button press	Always-on, automatic
Awareness	None	Real-time video + audio analysis
Response	Sends location	Speaks guidance, sends location + context
AI	None	Gemini 2.5 Flash Native Audio
Interaction	Tap-based	Fully voice-driven
Environmental data	None	Lighting, crowds, behavior indicators
Emergency alerts	Location only	Location + risk score + environmental factors

Lessons Learned

1. Native audio models change everything
Gemini's native audio output is dramatically more natural than text-to-speech. In a safety context, a calm, natural voice is reassuring. A robotic TTS voice is not.

2. Mobile audio is hard
The AudioContext suspended state issue cost me two days of debugging. The fix (create inside user gesture) is simple once you know it, but the debugging path is not obvious.

3. Structured output from conversational models
Getting a conversational model to reliably output structured tags alongside natural speech required careful prompt engineering. The key was providing exact format examples, not just descriptions.

4. Graceful degradation matters
Building the notification system to silently skip when Twilio isn't configured meant the app works perfectly in demo mode without any setup. This made testing and sharing much easier.

What's Next

Background mode — keep monitoring when the screen is off
Wearable integration — Apple Watch / WearOS for discreet alerts
Trusted contacts — share live location with family during monitoring sessions
Incident history — review past sessions with AI-generated summaries
Offline fallback — local risk assessment when connectivity drops

Try It Out

GitHub Repository: https://github.com/rahulgurunule/Guardian_AI

Key Takeaways

Real-time multimodal AI is now possible — Gemini Live API makes it accessible

Mobile audio engineering is solvable — With the right approach and patience

Safety tech can be proactive, not reactive — AI can watch and warn before danger strikes

Google Cloud is the right choice for Gemini — Seamless integration, cost-efficient, secure

Full-stack TypeScript is powerful — Same language across frontend, backend, and infrastructure

Final Thoughts

Building Guardian AI taught me that the best technology solutions start with a real problem. I didn't start by asking "how can I use Gemini Live API?" — I started by asking "how can I make people safer?"

The technology was the answer, not the starting point.

If you're interested in AI, real-time systems, or personal safety tech, I'd love to hear your thoughts. Feel free to reach out or check out the GitHub repo.

This project was created as a submission to the Google The Gemini Live Agent Challenge hackathon.

👥 Creators

This project was built by:

Rahul Gurunule — LinkedIn Profile
Sushma Gurunule — LinkedIn Profile

GoogleCloud #Gemini #AI #Hackathon #WebDevelopment #PersonalSafety #TechInnovation #FullStack #TypeScript

DEV Community