DEV Community

Cover image for How I built a real-time emotional spending intervention agent with Gemini Live
Rakibul Islam
Rakibul Islam

Posted on

How I built a real-time emotional spending intervention agent with Gemini Live

I built this project for the Gemini Live Agent Challenge 2026.

The idea started from a simple observation: I and most of my friends
regularly buy things we immediately regret. Not because we don't know
our financial situation — but because the bad decision happens at 11 PM
when we're stressed, and no tool is watching at that exact moment.

So I built one.

Sentience Finance is a Chrome extension that detects checkout pages
automatically, reads your emotional state through your camera, and opens
a Gemini Live voice conversation with your own spending history loaded
in — before you click buy.

The core architecture

The extension (content.js) watches for financial page patterns —
checkout URLs, payment selectors, keywords like "bKash" and "Cash on
Delivery" for platforms like Daraz. When it fires, it injects a HUD
token and sends the page context to the sidepanel.

The sidepanel opens a WebSocket to a FastAPI backend, which connects to
Gemini Live using the gemini-2.5-flash-native-audio-preview-12-2025
model via v1alpha. Audio goes in as 512-frame PCM chunks (32ms latency).
Audio comes back at 24kHz and plays through a persistent AudioContext
with chunks scheduled sequentially — no gaps, no pops.

The hardest part: making it actually feel live

Early builds required clicking a button for every sentence. That's a
chatbot. Making it feel like a real conversation required three fixes:

  1. Buffer size. 2048 frames = 128ms before Gemini hears your first
    word. Dropping to 512 frames cut that to 32ms.

  2. OS audio processing. echoCancellation: true adds 40-60ms on
    older hardware. Disabled it and replaced with a JavaScript echo gate
    that reads the RMS amplitude per chunk — blocks mic audio while AI is
    speaking, lets through loud barge-in.

  3. End-of-turn signaling. Gemini's cloud VAD waits ~600-800ms after
    silence before responding. A client-side VAD detects 600ms of silence
    and sends an explicit flush signal, cutting that wait significantly.

The Vulnerability Score

$$V_{score} = (0.4 \times F_{freq}) + (0.3 \times A_{spend}) + (0.3 \times R_{regret})$$

Three signals: purchase frequency in the current emotional state (40%),
average spend vs. baseline (30%), and self-rated regret on past purchases
(30%). In testing, interventions at scores ≥ 7.5 produced 67% cart
abandonment vs. 12% at ≤ 4.0.

What I learned

Financial behavior is psychological before it's mathematical. The
system prompt matters as much as the API calls. And the moment of
intervention matters more than the quality of analysis — a perfect
insight a week later changes nothing.

GitHub: https://github.com/rakib120207/Sentience

Top comments (0)