How I Built a Voice-Powered Portfolio Optimization Agent with Gemini Live API

#gemini #googlecloud #python #hackathon

I created this project for the purposes of entering the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge

The Problem

During a university investment challenge, my team spent hours debating market news: Fed decisions, jobs reports, earnings surprises, trying to agree on what each headline meant for our portfolio weights. The debate was valuable, but the process was painfully slow: read the news, form a view, open a spreadsheet, run the math, decide. By the time we had a number, the market had already moved.

That experience planted a question I couldn't shake: What if an AI agent could do that entire workflow in seconds? Speak the news. Hear the recommendation. Act.

This is Litterman.ai, a voice-powered portfolio optimization agent for asset managers, built for the Gemini Live Agent Challenge 2026.

What It Does

The workflow is simple:

The manager speaks a market headline: "The Fed raised rates 50 basis points today."
The agent immediately acknowledges: "Understood. Stand by for analysis."
Gemini 2.5 Flash classifies the news and — grounded with Google Search — extracts Black-Litterman views
The Black-Litterman engine optimizes the portfolio
The agent responds verbally with new allocations and the Sharpe ratio
The live dashboard on Cloud Run updates in real time via Firestore

The manager can interrupt mid-response with follow-up questions: the agent stops, listens, and answers.

The Black-Litterman Model

The core idea:

The name comes from Fischer Black and Robert Litterman, who developed this model at Goldman Sachs in 1990. It's the standard framework for institutional portfolio construction. It lets portfolio managers blend market equilibrium with their own views mathematically.

Think of it as a weather forecast that blends two sources: the historical climate data (what the market expects) and the meteorologist's own judgment (what you believe based on today's news). Black-Litterman does the same for portfolios, what the market implies, and adjusts it mathematically with your views.

Black-Litterman pipeline: from news to optimal weights:

In code, the equilibrium prior is derived via reverse optimization:


python
# Implied equilibrium returns
pi = risk_aversion * cov_matrix @ market_weights

# Posterior returns (Black-Litterman formula)
tau = 0.05
omega = np.diag(np.diag(tau * P @ cov_matrix @ P.T))
M1 = np.linalg.inv(tau * cov_matrix) + P.T @ np.linalg.inv(omega) @ P
M2 = np.linalg.inv(tau * cov_matrix) @ pi + P.T @ np.linalg.inv(omega) @ Q
mu_bl = np.linalg.inv(M1) @ M2

Gemini extracts the P matrix (which assets the view is about) and the Q vector (expected return for each view) directly from the news transcript, structured JSON that feeds straight into the optimizer.

Architecture

Google Cloud services used:

Gemini 2.5 Flash Native Audio (Live API) — voice I/O and barge-in
Gemini 2.5 Flash + Google Search Grounding — classification and view extraction
Cloud Firestore — real-time shared state
Cloud Run — hosts the Flask dashboard publicly
Cloud Build — automated container builds

The Hardest Part: Barge-In Without Lockouts

The most technically challenging feature was true barge-in, the ability to interrupt the agent mid-response and have it stop immediately and listen.

My first attempt used a single flag _bl_running to track whether the pipeline was active. This caused a nasty bug: if the user spoke while the flag was set, the conversation locked up entirely — the agent would stop responding even after the pipeline finished.

The fix was a two-flag architecture:

self._bl_running = False   # blocks duplicate pipeline execution
self._bl_cooldown = False  # prevents re-trigger, but NEVER blocks conversation

_bl_running is set for the duration of the BL pipeline and prevents duplicate execution. _bl_cooldown is set after the response and expires after 20 seconds, but critically, it never prevents the agent from listening or responding conversationally.

Another key learning: results must be injected back into the session using send_realtime_input(text=prompt), not send_client_content. Only the former maintains session context continuity and produces audio output.

Google Search Grounding in Practice

One of the most impactful features was integrating Google Search Grounding into the view extractor. Instead of relying solely on Gemini's training data, the model queries live web sources — current yields, inflation figures, central bank positioning — before extracting views.

In testing, a single analysis call consistently retrieved 7–17 live sources. The difference in view quality is noticeable: grounded views reference actual current yield levels, not approximate ones from training data.

The key to reliably triggering grounding is temporal language in the news input:

"The Federal Reserve **just* raised rates today..."*

The word "today" signals to the model that this is present-day information it cannot know from training, which reliably triggers a live search.

Cloud Run + Firestore: Stateless by Design

Cloud Run is stateless, with no persistent filesystem between requests. Early in development, I had the dashboard reading from a local state.json file, which worked locally but broke completely on Cloud Run. This was my first time working with both Cloud Run and Firestore, so mistakes like this were part of the learning curve, and honestly, the ones I'll remember longest.
The fix was moving all shared state to Cloud Firestore, with the dashboard polling every 2 seconds:

def push_bl_result(result: dict):
    db = firestore.Client(project=PROJECT_ID)
    doc_ref = db.collection("litterman").document("state")
    doc_ref.set({"portfolio": result}, merge=True)

One critical detail: the Firestore database must be named (default). Custom names (I tried littermandb) cause 404 errors with the Python SDK that are frustratingly silent to debug. Simple fix once you know, but it cost me more time than I'd like to admit.

What I Learned

The Gemini Live API is genuinely capable of real-time barge-in with the right architecture, and interruptions feel natural
Google Search Grounding adds real value for time-sensitive data, not just for factual lookups, but for anchoring quantitative views in current market conditions
Cloud Run + Firestore is a clean pattern for stateless agents — but the (default) database naming and merge=True updates are details that matter
thinking_budget=0 is essential for voice, without it, Gemini outputs markdown thinking headers as audio, producing garbled responses
CRLF on Windows corrupts Dockerfiles, always set git config core.autocrlf false and use .gitattributes to enforce LF.