Vertex AI Grounding Cost Gap: Diagnosing the Missing $1300 on My Solo VM

#vertexai #llmcosts #gcp #costoptimization

Vertex AI Grounding Cost Gap: Diagnosing the Missing $1300 on My Solo VM

Running a full AI product solo on a single small VM means every dollar counts. Recently, I noticed a jarring discrepancy in my Google Cloud Platform (GCP) billing for Vertex AI. The admin dashboard showed around ₩400,000 for the month, but the actual GCP bill was closer to ₩1,740,000. That's a nearly ₩1,300,000 gap – a significant chunk of change I couldn't account for. I needed to figure out where this money was disappearing.

My first instinct was to check the usual suspects: token usage. My application logs and the admin dashboard's token usage metrics seemed reasonable. I also confirmed there were no significant image generation costs that month, and my experimental lab runs were all in dry-run mode. The numbers just didn't add up. This led me down a path of elimination, trying to pinpoint the missing cost driver.

The breakthrough came when I realized my core chat functionality was using the google_search tool. This is a powerful feature that allows the AI to ground its responses in real-time web information. However, I had configured it to be always on, meaning it would trigger for a significant portion of user queries. The problem was how this grounding cost was being reported (or, rather, not reported) in my internal metrics.

Vertex AI charges for grounding separately from token usage. The cost is roughly $0.035 per 1000 grounding requests. While my internal usage_logs service diligently tracked token consumption, it completely missed these grounding requests. Each search query, even for seemingly simple questions, incurred this separate fee. With approximately 27 search queries across 79 chat sessions, multiplied by the $0.035 cost per thousand, the math started to align alarmingly well with the missing ₩1,300,000.

The Root Cause: Incomplete Cost Telemetry

The core issue wasn't that the cost wasn't being incurred, but that my application's internal telemetry was incomplete. It was only capturing token usage and not the specific costs associated with using tools like Google Search for grounding. This created a blind spot, making it impossible to accurately track the true operational expenses of my AI product.

The Fix: Visibility and Smart Triggers

To address this, I implemented two key changes:

Explicit Grounding Cost Logging: I modified my gemini_llm_service.py to explicitly record grounding costs. When the google_search tool is used (indicated by ctx.search_used), I now call UsageRepository.record_grounding($0.035, source='grounding'). This ensures that grounding expenses are logged and reflected in my admin dashboard, providing a true cost picture that matches the GCP bill.
Smart Search Triggering: To prevent unnecessary costs, I introduced a more intelligent trigger for the search tool. The _needs_search(user_text) function now analyzes user input for specific signals that indicate a web search is genuinely required. Keywords like 'latest', 'weather', 'stock price', 'release', 'search', URLs, or specific years prompt a search. Casual conversation or general queries no longer trigger it by default. This significantly reduces unnecessary grounding calls while ensuring the feature is available when truly needed. I also reverted the GEMINI_SEARCH_ALWAYS=1 setting to this smarter approach.

The Lesson: Look Beyond Token Counts

This experience was a stark reminder that LLM costs are multifaceted. Relying solely on token counts for cost monitoring is insufficient. Tool usage, grounding, image generation, and other auxiliary services often come with separate SKUs that can significantly inflate your bill. It's crucial to implement telemetry that captures these costs explicitly, broken down by service or SKU, to maintain accurate financial visibility and control.

The ability to see these costs clearly in my admin dashboard, now categorized under 'grounding', gives me the confidence that my spending aligns with actual usage. This diagnostic journey, while initially alarming, ultimately led to a more robust and cost-aware AI product.

...building aicoreutility.com in the open...

💬 This is part of *Riel** — a full AI product I'm building solo, in public (failures and all). Read more build logs → · See the product →*