DEV Community

Preetham
Preetham

Posted on

Building G-Axis: A Voice AI Companion + Browser Agent with Gemini Live API

Your browser already works. You don't need a new app to experience AI.

That was the idea behind G-Axis — a Chrome extension I built for the Gemini Live Agent Challenge that turns your
existing browser into an AI-powered workspace.

No new tabs. No new logins. Just intelligence, right where you already work.

The Problem

Ever had 6 tabs open just to do one thing? ChatGPT here, Calendar there, Google Search somewhere else. Every AI tool
lives in its own silo. And they can all talk — but none of them can actually do anything in your browser.

What G-Axis Does

Two things:

Talk to it — Click the mic, pick one of 8 AI personas, and have a real conversation. Not text-to-speech. Real
bidirectional audio via Gemini's Live API. Ask it anything — it searches the web in real-time via Google Search
grounding.

Delegate to it — Type "Plan a 5-day Japan itinerary" and watch it research, navigate, and generate a full
document. Type "Schedule a meeting tomorrow at 10am" and it opens Calendar, fills the form, and saves.

The Gemini Stack

Here's what powers it under the hood:

Gemini Live API — The Voice Engine

This was the breakthrough. Gemini's native audio model (gemini-2.5-flash-native-audio) handles real-time voice
natively — no separate STT/TTS pipeline. The extension's service worker connects directly via WebSocket. Zero hops.
Zero latency.

I built 8 personas on top of it, each with a different Gemini voice and personality:

| Persona | Voice | Vibe |
|---------|-------|------|
| Friendly Buddy | Puck | Your go-to friend |
| Wise Mentor | Charon | Guidance, not lectures |
| Creative Partner | Aoede | Ideas machine |
| Job Interviewer | Kore | Practice makes perfect |
| Chill Companion | Fenrir | Just vibes |
| Professional Coach | Kore | Sharpen your edge |
| Friendly Debater | Charon | Challenge your thinking |
| Storyteller | Aoede | Bring ideas to life |

Switch mid-conversation. The voice changes. The personality changes. The previous session saves automatically.

Google Search — Real-Time Grounding

Ask "What's the latest AI news?" and Gemini doesn't guess from training data — it searches the web live and answers
with current information. This is the google_search tool baked into the Live API config.

Gemini 2.5 Flash — The Brain

Task planning. Function calling. Session analysis. Every voice conversation gets analyzed for 5 communication skills:

  • Confidence
  • Clarity
  • Engagement
  • Listening
  • Pacing

Users earn XP, level up, and track progress on a dashboard.

### Gemini Vision — The Eyes

For browser automation, screenshots are sent to Gemini Vision. It understands what's on screen — buttons, forms,
navigation — and decides where to click, type, and scroll.

The Hard Parts

Mic permissions in Chrome extensions — Sidepanels can't access getUserMedia. I tried 4 approaches before landing
on a minimal popup window with an AudioWorklet processor streaming PCM audio via Chrome ports.

Audio playback — My first approach used onended callbacks to chain audio buffers. This caused 5-20ms gaps
between every chunk — speech sounded choppy. The fix: schedule each AudioBufferSource to start at the exact
timestamp the previous one ends. Gapless.

Session timeouts — Gemini Live sessions die after ~10 minutes. I built transparent auto-reconnection (up to 20x)
so conversations can last over an hour without the user noticing.

Security — The API key accidentally got committed to the public repo. I scrubbed it from git history with
filter-branch, rotated the key, and moved to OAuth2 short-lived tokens. The key now lives in Cloud Secret Manager
and never touches client code.

Google Cloud Setup

Cloud Run → Backend hosting (FastAPI, 2 vCPU, 2GB, autoscale)
Secret Manager → API key → OAuth2 tokens (60-min expiry)
Cloud Build → Docker image CI/CD
Terraform → Full IaC (one file, all resources)

One command deploys everything:


bash
  ./deploy.sh gaxis-488323

  Architecture

  https://raw.githubusercontent.com/preethamtjit20-spec/gaxis/main/architecture-v3.png

  Try It

  The backend is live:

  curl https://gaxis-132388856648.us-central1.run.app/health
  # {"status":"ok","agent":true}

  Full source + setup instructions: https://github.com/preethamtjit20-spec/gaxis

  ---
  Built for the https://geminiliveagentchallenge.devpost.com/. Your browser already works — G-Axis makes it intelligent.

  ---

Enter fullscreen mode Exit fullscreen mode

Top comments (0)