I created this project for the purposes of entering the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge
The Problem
During a university investment challenge, my team spent hours debating market news: Fed decisions, jobs reports, earnings surprises, trying to agree on what each headline meant for our portfolio weights. The debate was valuable, but the process was painfully slow: read the news, form a view, open a spreadsheet, run the math, decide. By the time we had a number, the market had already moved.
That experience planted a question I couldn't shake: What if an AI agent could do that entire workflow in seconds? Speak the news. Hear the recommendation. Act.
This is Litterman.ai, a voice-powered portfolio optimization agent for asset managers, built for the Gemini Live Agent Challenge 2026.
What It Does
The workflow is simple:
- The manager speaks a market headline: "The Fed raised rates 50 basis points today."
- The agent immediately acknowledges: "Understood. Stand by for analysis."
- Gemini 2.5 Flash classifies the news and — grounded with Google Search — extracts Black-Litterman views
- The Black-Litterman engine optimizes the portfolio
- The agent responds verbally with new allocations and the Sharpe ratio
- The live dashboard on Cloud Run updates in real time via Firestore
The manager can interrupt mid-response with follow-up questions: the agent stops, listens, and answers.
The Black-Litterman Model
The name comes from Fischer Black and Robert Litterman, who developed this model at Goldman Sachs in 1990. It's the standard framework for institutional portfolio construction. It lets portfolio managers blend market equilibrium with their own views mathematically.
Think of it as a weather forecast that blends two sources: the historical climate data (what the market expects) and the meteorologist's own judgment (what you believe based on today's news). Black-Litterman does the same for portfolios, what the market implies, and adjusts it mathematically with your views.
Black-Litterman pipeline: from news to optimal weights:

In code, the equilibrium prior is derived via reverse optimization:
python
# Implied equilibrium returns
pi = risk_aversion * cov_matrix @ market_weights
# Posterior returns (Black-Litterman formula)
tau = 0.05
omega = np.diag(np.diag(tau * P @ cov_matrix @ P.T))
M1 = np.linalg.inv(tau * cov_matrix) + P.T @ np.linalg.inv(omega) @ P
M2 = np.linalg.inv(tau * cov_matrix) @ pi + P.T @ np.linalg.inv(omega) @ Q
mu_bl = np.linalg.inv(M1) @ M2
Gemini extracts the P matrix (which assets the view is about) and the Q vector (expected return for each view) directly from the news transcript, structured JSON that feeds straight into the optimizer.
Architecture
Google Cloud services used:
- Gemini 2.5 Flash Native Audio (Live API) — voice I/O and barge-in
- Gemini 2.5 Flash + Google Search Grounding — classification and view extraction
- Cloud Firestore — real-time shared state
- Cloud Run — hosts the Flask dashboard publicly
- Cloud Build — automated container builds
The Hardest Part: Barge-In Without Lockouts
The most technically challenging feature was true barge-in, the ability to interrupt the agent mid-response and have it stop immediately and listen.
My first attempt used a single flag _bl_running to track whether the pipeline was active. This caused a nasty bug: if the user spoke while the flag was set, the conversation locked up entirely — the agent would stop responding even after the pipeline finished.
The fix was a two-flag architecture:
self._bl_running = False # blocks duplicate pipeline execution
self._bl_cooldown = False # prevents re-trigger, but NEVER blocks conversation
_bl_running is set for the duration of the BL pipeline and prevents duplicate execution. _bl_cooldown is set after the response and expires after 20 seconds, but critically, it never prevents the agent from listening or responding conversationally.
Another key learning: results must be injected back into the session using send_realtime_input(text=prompt), not send_client_content. Only the former maintains session context continuity and produces audio output.
Google Search Grounding in Practice
One of the most impactful features was integrating Google Search Grounding into the view extractor. Instead of relying solely on Gemini's training data, the model queries live web sources — current yields, inflation figures, central bank positioning — before extracting views.
In testing, a single analysis call consistently retrieved 7–17 live sources. The difference in view quality is noticeable: grounded views reference actual current yield levels, not approximate ones from training data.
The key to reliably triggering grounding is temporal language in the news input:
"The Federal Reserve **just* raised rates today..."*
The word "today" signals to the model that this is present-day information it cannot know from training, which reliably triggers a live search.
Cloud Run + Firestore: Stateless by Design
Cloud Run is stateless, with no persistent filesystem between requests. Early in development, I had the dashboard reading from a local state.json file, which worked locally but broke completely on Cloud Run. This was my first time working with both Cloud Run and Firestore, so mistakes like this were part of the learning curve, and honestly, the ones I'll remember longest.
The fix was moving all shared state to Cloud Firestore, with the dashboard polling every 2 seconds:
def push_bl_result(result: dict):
db = firestore.Client(project=PROJECT_ID)
doc_ref = db.collection("litterman").document("state")
doc_ref.set({"portfolio": result}, merge=True)
One critical detail: the Firestore database must be named (default). Custom names (I tried littermandb) cause 404 errors with the Python SDK that are frustratingly silent to debug. Simple fix once you know, but it cost me more time than I'd like to admit.
What I Learned
- The Gemini Live API is genuinely capable of real-time barge-in with the right architecture, and interruptions feel natural
- Google Search Grounding adds real value for time-sensitive data, not just for factual lookups, but for anchoring quantitative views in current market conditions
-
Cloud Run + Firestore is a clean pattern for stateless agents — but the
(default)database naming andmerge=Trueupdates are details that matter -
thinking_budget=0is essential for voice, without it, Gemini outputs markdown thinking headers as audio, producing garbled responses -
CRLF on Windows corrupts Dockerfiles, always set
git config core.autocrlf falseand use.gitattributesto enforce LF.
Try It
- 🔗 Live dashboard: https://litterman-dashboard-1084835415345.us-central1.run.app
- 💻 Repo: https://github.com/gilbertoitalo/litterman-ai
- 🌐 Landing page: https://gilbertoitalo.github.io/litterman-ai/
The repo includes full spin-up instructions for running the voice agent locally. If you try it, let me know what you build with it.
Built for the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge


Top comments (0)