<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Preetham</title>
    <description>The latest articles on DEV Community by Preetham (@preetham_25a78ec384dac787).</description>
    <link>https://dev.to/preetham_25a78ec384dac787</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3828262%2Fe0ea9c10-7e96-410e-88e3-26229545c0e0.png</url>
      <title>DEV Community: Preetham</title>
      <link>https://dev.to/preetham_25a78ec384dac787</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/preetham_25a78ec384dac787"/>
    <language>en</language>
    <item>
      <title>Building G-Axis: A Voice AI Companion + Browser Agent with Gemini Live API</title>
      <dc:creator>Preetham</dc:creator>
      <pubDate>Mon, 16 Mar 2026 23:57:26 +0000</pubDate>
      <link>https://dev.to/preetham_25a78ec384dac787/building-g-axis-a-voice-ai-companion-browser-agent-with-gemini-live-api-l4k</link>
      <guid>https://dev.to/preetham_25a78ec384dac787/building-g-axis-a-voice-ai-companion-browser-agent-with-gemini-live-api-l4k</guid>
      <description>&lt;p&gt;Your browser already works. You don't need a new app to experience AI.&lt;/p&gt;

&lt;p&gt;That was the idea behind &lt;strong&gt;G-Axis&lt;/strong&gt; — a Chrome extension I built for the Gemini Live Agent Challenge that turns your&lt;br&gt;
  existing browser into an AI-powered workspace.&lt;/p&gt;

&lt;p&gt;No new tabs. No new logins. Just intelligence, right where you already work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Ever had 6 tabs open just to do one thing? ChatGPT here, Calendar there, Google Search somewhere else. Every AI tool&lt;br&gt;
  lives in its own silo. And they can all &lt;em&gt;talk&lt;/em&gt; — but none of them can actually &lt;em&gt;do&lt;/em&gt; anything in your browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  What G-Axis Does
&lt;/h2&gt;

&lt;p&gt;Two things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Talk to it&lt;/strong&gt; — Click the mic, pick one of 8 AI personas, and have a real conversation. Not text-to-speech. Real&lt;br&gt;
  bidirectional audio via Gemini's Live API. Ask it anything — it searches the web in real-time via Google Search&lt;br&gt;
  grounding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delegate to it&lt;/strong&gt; — Type "Plan a 5-day Japan itinerary" and watch it research, navigate, and generate a full&lt;br&gt;
  document. Type "Schedule a meeting tomorrow at 10am" and it opens Calendar, fills the form, and saves.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Gemini Stack
&lt;/h2&gt;

&lt;p&gt;Here's what powers it under the hood:&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini Live API — The Voice Engine
&lt;/h3&gt;

&lt;p&gt;This was the breakthrough. Gemini's native audio model (&lt;code&gt;gemini-2.5-flash-native-audio&lt;/code&gt;) handles real-time voice&lt;br&gt;
  natively — no separate STT/TTS pipeline. The extension's service worker connects directly via WebSocket. Zero hops.&lt;br&gt;
  Zero latency.&lt;/p&gt;

&lt;p&gt;I built 8 personas on top of it, each with a different Gemini voice and personality:&lt;/p&gt;

&lt;p&gt;| Persona | Voice | Vibe |&lt;br&gt;
|---------|-------|------|&lt;br&gt;
| Friendly Buddy | Puck | Your go-to friend |&lt;br&gt;
| Wise Mentor | Charon | Guidance, not lectures |&lt;br&gt;
| Creative Partner | Aoede | Ideas machine |&lt;br&gt;
| Job Interviewer | Kore | Practice makes perfect |&lt;br&gt;
| Chill Companion | Fenrir | Just vibes |&lt;br&gt;
| Professional Coach | Kore | Sharpen your edge |&lt;br&gt;
| Friendly Debater | Charon | Challenge your thinking |&lt;br&gt;
| Storyteller | Aoede | Bring ideas to life |&lt;/p&gt;

&lt;p&gt;Switch mid-conversation. The voice changes. The personality changes. The previous session saves automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Search — Real-Time Grounding
&lt;/h3&gt;

&lt;p&gt;Ask "What's the latest AI news?" and Gemini doesn't guess from training data — it searches the web live and answers&lt;br&gt;
  with current information. This is the &lt;code&gt;google_search&lt;/code&gt; tool baked into the Live API config.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini 2.5 Flash — The Brain
&lt;/h3&gt;

&lt;p&gt;Task planning. Function calling. Session analysis. Every voice conversation gets analyzed for 5 communication skills:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confidence&lt;/li&gt;
&lt;li&gt;Clarity&lt;/li&gt;
&lt;li&gt;Engagement&lt;/li&gt;
&lt;li&gt;Listening&lt;/li&gt;
&lt;li&gt;Pacing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users earn XP, level up, and track progress on a dashboard.&lt;/p&gt;

&lt;p&gt;### Gemini Vision — The Eyes&lt;/p&gt;

&lt;p&gt;For browser automation, screenshots are sent to Gemini Vision. It understands what's on screen — buttons, forms,&lt;br&gt;
  navigation — and decides where to click, type, and scroll.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hard Parts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Mic permissions in Chrome extensions&lt;/strong&gt; — Sidepanels can't access &lt;code&gt;getUserMedia&lt;/code&gt;. I tried 4 approaches before landing&lt;br&gt;
   on a minimal popup window with an AudioWorklet processor streaming PCM audio via Chrome ports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audio playback&lt;/strong&gt; — My first approach used &lt;code&gt;onended&lt;/code&gt; callbacks to chain audio buffers. This caused 5-20ms gaps&lt;br&gt;
  between every chunk — speech sounded choppy. The fix: schedule each &lt;code&gt;AudioBufferSource&lt;/code&gt; to start at the exact&lt;br&gt;
  timestamp the previous one ends. Gapless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session timeouts&lt;/strong&gt; — Gemini Live sessions die after ~10 minutes. I built transparent auto-reconnection (up to 20x)&lt;br&gt;
  so conversations can last over an hour without the user noticing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt; — The API key accidentally got committed to the public repo. I scrubbed it from git history with&lt;br&gt;
  &lt;code&gt;filter-branch&lt;/code&gt;, rotated the key, and moved to OAuth2 short-lived tokens. The key now lives in Cloud Secret Manager&lt;br&gt;
  and never touches client code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Cloud Setup
&lt;/h2&gt;

&lt;p&gt;Cloud Run      → Backend hosting (FastAPI, 2 vCPU, 2GB, autoscale)&lt;br&gt;
  Secret Manager → API key → OAuth2 tokens (60-min expiry)&lt;br&gt;
  Cloud Build    → Docker image CI/CD&lt;br&gt;
  Terraform      → Full IaC (one file, all resources)&lt;/p&gt;

&lt;p&gt;One command deploys everything:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
  ./deploy.sh gaxis-488323

  Architecture

  https://raw.githubusercontent.com/preethamtjit20-spec/gaxis/main/architecture-v3.png

  Try It

  The backend is live:

  curl https://gaxis-132388856648.us-central1.run.app/health
  # {"status":"ok","agent":true}

  Full source + setup instructions: https://github.com/preethamtjit20-spec/gaxis

  ---
  Built for the https://geminiliveagentchallenge.devpost.com/. Your browser already works — G-Axis makes it intelligent.

  ---

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>geminiliveagentchallenge</category>
      <category>gemini</category>
      <category>chromeextension</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
