Your browser already works. You don't need a new app to experience AI.
That was the idea behind G-Axis — a Chrome extension I built for the Gemini Live Agent Challenge that turns your
existing browser into an AI-powered workspace.
No new tabs. No new logins. Just intelligence, right where you already work.
The Problem
Ever had 6 tabs open just to do one thing? ChatGPT here, Calendar there, Google Search somewhere else. Every AI tool
lives in its own silo. And they can all talk — but none of them can actually do anything in your browser.
What G-Axis Does
Two things:
Talk to it — Click the mic, pick one of 8 AI personas, and have a real conversation. Not text-to-speech. Real
bidirectional audio via Gemini's Live API. Ask it anything — it searches the web in real-time via Google Search
grounding.
Delegate to it — Type "Plan a 5-day Japan itinerary" and watch it research, navigate, and generate a full
document. Type "Schedule a meeting tomorrow at 10am" and it opens Calendar, fills the form, and saves.
The Gemini Stack
Here's what powers it under the hood:
Gemini Live API — The Voice Engine
This was the breakthrough. Gemini's native audio model (gemini-2.5-flash-native-audio) handles real-time voice
natively — no separate STT/TTS pipeline. The extension's service worker connects directly via WebSocket. Zero hops.
Zero latency.
I built 8 personas on top of it, each with a different Gemini voice and personality:
| Persona | Voice | Vibe |
|---------|-------|------|
| Friendly Buddy | Puck | Your go-to friend |
| Wise Mentor | Charon | Guidance, not lectures |
| Creative Partner | Aoede | Ideas machine |
| Job Interviewer | Kore | Practice makes perfect |
| Chill Companion | Fenrir | Just vibes |
| Professional Coach | Kore | Sharpen your edge |
| Friendly Debater | Charon | Challenge your thinking |
| Storyteller | Aoede | Bring ideas to life |
Switch mid-conversation. The voice changes. The personality changes. The previous session saves automatically.
Google Search — Real-Time Grounding
Ask "What's the latest AI news?" and Gemini doesn't guess from training data — it searches the web live and answers
with current information. This is the google_search tool baked into the Live API config.
Gemini 2.5 Flash — The Brain
Task planning. Function calling. Session analysis. Every voice conversation gets analyzed for 5 communication skills:
- Confidence
- Clarity
- Engagement
- Listening
- Pacing
Users earn XP, level up, and track progress on a dashboard.
### Gemini Vision — The Eyes
For browser automation, screenshots are sent to Gemini Vision. It understands what's on screen — buttons, forms,
navigation — and decides where to click, type, and scroll.
The Hard Parts
Mic permissions in Chrome extensions — Sidepanels can't access getUserMedia. I tried 4 approaches before landing
on a minimal popup window with an AudioWorklet processor streaming PCM audio via Chrome ports.
Audio playback — My first approach used onended callbacks to chain audio buffers. This caused 5-20ms gaps
between every chunk — speech sounded choppy. The fix: schedule each AudioBufferSource to start at the exact
timestamp the previous one ends. Gapless.
Session timeouts — Gemini Live sessions die after ~10 minutes. I built transparent auto-reconnection (up to 20x)
so conversations can last over an hour without the user noticing.
Security — The API key accidentally got committed to the public repo. I scrubbed it from git history with
filter-branch, rotated the key, and moved to OAuth2 short-lived tokens. The key now lives in Cloud Secret Manager
and never touches client code.
Google Cloud Setup
Cloud Run → Backend hosting (FastAPI, 2 vCPU, 2GB, autoscale)
Secret Manager → API key → OAuth2 tokens (60-min expiry)
Cloud Build → Docker image CI/CD
Terraform → Full IaC (one file, all resources)
One command deploys everything:
bash
./deploy.sh gaxis-488323
Architecture
https://raw.githubusercontent.com/preethamtjit20-spec/gaxis/main/architecture-v3.png
Try It
The backend is live:
curl https://gaxis-132388856648.us-central1.run.app/health
# {"status":"ok","agent":true}
Full source + setup instructions: https://github.com/preethamtjit20-spec/gaxis
---
Built for the https://geminiliveagentchallenge.devpost.com/. Your browser already works — G-Axis makes it intelligent.
---
Top comments (0)