Disclosure: This article and the Guardian AI project described herein were created for the purposes of entering the Google The Gemini Live Agent Challenge hackathon.
The Problem I Wanted to Solve
I was walking home late one evening when I realized something: my phone — capable of recording video, capturing audio, and connecting to the internet — couldn't help me in real time. It could only call for help after something happened.
That's when I thought: What if my phone could see what I see, hear what I hear, and warn me before danger strikes?
Personal safety apps exist, but they're all reactive:
- Panic buttons require you to recognize danger AND remember to press a button
- Location sharing only helps after an incident
- Check-in apps require you to stay engaged
None of them are truly proactive. They don't watch. They don't listen. They don't speak.
I wanted to build something different: an AI companion that's always paying attention, understands context, and can speak to you in a calm, natural voice the moment something feels wrong.
Introducing Guardian AI
Guardian AI is a real-time personal safety companion that uses your phone's camera and microphone to continuously monitor your surroundings and alert you — or your emergency contacts — the moment danger is detected.
The app uses Google's Gemini 2.5 Flash Native Audio model via the Live Bidirectional API to process live video frames and audio simultaneously, assess environmental risk in real time, and respond with natural spoken guidance.
No typing. No tapping. Just a calm, intelligent voice keeping you safe.
Why This Matters Now
Safety concerns are growing:
- Women report feeling unsafe in public spaces
- Travelers face unfamiliar environments
- Vulnerable populations need extra protection
- Traditional panic buttons are outdated
But AI has evolved. We now have models that can see, hear, and speak in real time. We have the technology to build something genuinely protective.
Guardian AI addresses three real problems:
- Delayed response — Traditional panic buttons require the user to act. Guardian AI acts for you.
- Situational blindness — You can't always see what's behind you or around a corner. The AI can.
- Isolation in emergencies — When you're scared, you may not be able to call for help. Guardian AI calls for you.
Key Technical Accomplishments
First-of-Its-Kind Real-Time Multimodal Safety App
Guardian AI is the first consumer safety application to leverage Gemini Live API's bidirectional audio/video streaming. We achieved true real-time processing where camera frames, microphone input, and AI analysis happen simultaneously — with spoken responses delivered in real time.
Solved Complex Mobile Audio Engineering
Building real-time audio streaming on mobile browsers presented three major challenges:
AudioContext Suspension — Mobile browsers suspend AudioContext unless created inside a user gesture. Solution: Create both input and output contexts inside the WebSocket connection handler.
PCM16 Encoding — Gemini's Live API expects raw PCM16 audio at 16kHz, but Web Audio API provides Float32 samples. We implemented efficient conversion and base64 encoding in 8KB chunks.
Conversation Flow — Gemini's BidiGenerateContent doesn't auto-respond to silence. We built a smart conversation pulse: check every 10 seconds, send an automated prompt only after 30 seconds of silence.
Structured Data from Conversational AI
We engineered Gemini to output both natural speech AND structured metadata tags simultaneously. The system embeds tags like [Lighting:Well-lit][Crowds:Empty][Behavior:Normal][Risk:20] in every response. The frontend parses these tags to drive color-coded UI indicators, then strips them before display.
Production-Grade Full-Stack Architecture
- Frontend: React 19 + TypeScript + Vite (optimized for mobile)
- Backend: Node.js relay server on Google Cloud Run (stateless, auto-scaling)
- AI: Gemini 2.5 Flash Native Audio via Live API (bidirectional WebSocket)
- Notifications: Twilio SMS + SendGrid email with GPS coordinates
- Infrastructure: Fully containerized with Docker, deployed via GitHub Actions + Workload Identity Federation
Keyless Secure Deployment
Implemented Workload Identity Federation for GitHub Actions → GCP authentication with zero service account keys. The CI/CD pipeline automatically builds Docker images, pushes to Artifact Registry, and deploys to Cloud Run — all without storing credentials in GitHub.
The Technical Stack
| Layer | Technology |
|---|---|
| Frontend | React 19, TypeScript, Vite, Tailwind CSS |
| Backend | Node.js, Express 5, TypeScript |
| AI Model | Gemini 2.5 Flash Native Audio (Live API) |
| Real-time | WebSocket, Web Audio API, WebRTC |
| Notifications | Twilio SMS, Twilio Email (SendGrid) |
| Backend Hosting | Google Cloud Run |
| Frontend Hosting | Firebase Hosting |
| Container Registry | Google Artifact Registry |
| CI/CD | GitHub Actions with Workload Identity Federation |
Why Google Cloud?
Guardian AI requires Google Cloud because:
- Gemini Live API is exclusive — Only available on Google Cloud Platform
- Native audio bidirectional streaming — Unique to Gemini, not available on AWS/Azure
- Cloud Run WebSocket support — Essential for real-time relay architecture
- Seamless integration — Gemini API, Cloud Run, Artifact Registry, Firebase all work together
- Cost efficiency — Cloud Run scales to zero, Firebase Hosting is free tier friendly
What Makes Guardian AI Different
| Feature | Traditional Safety Apps | Guardian AI |
|---|---|---|
| Activation | Manual button press | Always-on, automatic |
| Awareness | None | Real-time video + audio analysis |
| Response | Sends location | Speaks guidance, sends location + context |
| AI | None | Gemini 2.5 Flash Native Audio |
| Interaction | Tap-based | Fully voice-driven |
| Environmental data | None | Lighting, crowds, behavior indicators |
| Emergency alerts | Location only | Location + risk score + environmental factors |
Lessons Learned
1. Native audio models change everything
Gemini's native audio output is dramatically more natural than text-to-speech. In a safety context, a calm, natural voice is reassuring. A robotic TTS voice is not.
2. Mobile audio is hard
The AudioContext suspended state issue cost me two days of debugging. The fix (create inside user gesture) is simple once you know it, but the debugging path is not obvious.
3. Structured output from conversational models
Getting a conversational model to reliably output structured tags alongside natural speech required careful prompt engineering. The key was providing exact format examples, not just descriptions.
4. Graceful degradation matters
Building the notification system to silently skip when Twilio isn't configured meant the app works perfectly in demo mode without any setup. This made testing and sharing much easier.
What's Next
- Background mode — keep monitoring when the screen is off
- Wearable integration — Apple Watch / WearOS for discreet alerts
- Trusted contacts — share live location with family during monitoring sessions
- Incident history — review past sessions with AI-generated summaries
- Offline fallback — local risk assessment when connectivity drops
Try It Out
GitHub Repository: https://github.com/rahulgurunule/Guardian_AI
Key Takeaways
Real-time multimodal AI is now possible — Gemini Live API makes it accessible
Mobile audio engineering is solvable — With the right approach and patience
Safety tech can be proactive, not reactive — AI can watch and warn before danger strikes
Google Cloud is the right choice for Gemini — Seamless integration, cost-efficient, secure
Full-stack TypeScript is powerful — Same language across frontend, backend, and infrastructure
Final Thoughts
Building Guardian AI taught me that the best technology solutions start with a real problem. I didn't start by asking "how can I use Gemini Live API?" — I started by asking "how can I make people safer?"
The technology was the answer, not the starting point.
If you're interested in AI, real-time systems, or personal safety tech, I'd love to hear your thoughts. Feel free to reach out or check out the GitHub repo.
This project was created as a submission to the Google The Gemini Live Agent Challenge hackathon.
👥 Creators
This project was built by:
- Rahul Gurunule — LinkedIn Profile
- Sushma Gurunule — LinkedIn Profile
Top comments (0)