If you’ve ever tried to get a quick answer on “Why did our cloud spend spike yesterday?” and found yourself tangled in slow dashboards, expensive queries, or pricey SaaS licenses, welcome to the club. FinOps is hard, but ironically, analyzing cloud costs often feels more expensive and cumbersome than the costs themselves.
In this article, I want to share a fresh architectural approach that flips the script entirely, a “Local-First” AI FinOps Agent that lives right on your laptop, powered by Google’s Gemini CLI and the Model Context Protocol (MCP). The result? Instant, natural-language answers about your cloud billing data, zero cloud query charges, and absolutely no dashboard lag. Here’s how.
The Problem: Paying to Understand What We Pay For
When monitoring Google Cloud costs, we face a strange paradox, a classic “Cost of Cost Analysis”. Let’s break down the pain points I see every day:
1. The BigQuery Scan Tax 💸
Your billing data lives in BigQuery, which charges based on data scanned, about $6.25 per terabyte.
That means one careless query by someone who just wants to “see all logs from last week” can cost you tens of dollars, and that’s before the productivity cost of waiting for results. Ouch.
2. The Licensing Barrier 🚪
Natural language querying tools like Gemini for Google Cloud can automatically turn a casual question into an SQL query. Sounds perfect, right? Except they come at a per-seat price (~$19/user/month), which quickly balloons the cost when you try to roll them out organization-wide (a few are free to start with and later changed to a charging model)
3. Dashboard Latency and Rigidity 🕰️
BI dashboards hide complexity behind clicks and charts, but often feel clunky for deep-dive or ad-hoc questions. They force you to navigate predefined views, not exactly conversational or fast when you’re racing a fire drill.
Our Solution: The Local-First Architecture 🏠✨
What if the “heavy lifting” didn’t happen in the cloud? What if every engineer had instant access to their project’s billing data, answering natural questions locally without costing a dime more?
Here’s the core idea:
Shift the query compute from the cloud to your laptop.
We leverage three ingredients to pull this off:
- Gemini CLI: Our natural language interface that transforms plain English into SQL queries.
- Model Context Protocol (MCP): A lightweight local server running on the user's machine that orchestrates and executes SQL queries against local databases.
- Decentralized data sync: Syncing compact, optimized billing datasets to engineers’ devices using existing corporate storage tools (OneDrive / SharePoint).
Core Design Principles:
- Query Once, Distribute Many: Execute one optimized aggregation query per day in the cloud.
- Zero-Cost Local Queries: All ad-hoc analysis happens on the user's laptop using local storage.
- Natural Language Interface: Use Generative AI (Gemini) to translate user intent into database queries locally.
How It Works: A Day in the Life of Your Local-First FinOps Agent
Step 1: Ingest & Optimize
Instead of running thousands of raw billing queries, a single optimized aggregation runs once a day in the cloud. This job:
- Summarizes raw billing logs into partitioned, compressed SQLite databases for efficient local querying.
- Data is partitioned by dimensions like
Project,Service, andDatefor quick filtering. - Scans only a fraction of the data compared to raw logs. This yields 1 query/day instead of 1,000 queries/day, dramatically cutting costs.
Step 2: Sync
The aggregated SQLite database files are synced to every engineer’s laptop through OneDrive or SharePoint sync clients — no new infrastructure, no added cloud storage cost.
- Sync happens incrementally.
- Files remain small (a few hundred MB, optimized by partitioning and compression).
- Data privacy is controlled by existing SharePoint permissions.
Step 3: Query Locally with MCP & Gemini
Here’s where the magic happens:
- A local MCP agent runs as a lightweight server on your machine.
- Gemini CLI takes your natural language query and passes an SQL prompt to MCP.
- MCP uses a SQLite engine locally to run queries within milliseconds.
- Results are returned to Gemini to synthesize human-readable answers by leveraging large language model reasoning on local computation context.
Example User Interaction
Open your gemini cli terminal and enter:
why is the checkout service 20% over budget this month?
Under the hood:
Gemini translates this to something like:
SELECT service, project, SUM(cost) as total_cost
FROM billing_summary
WHERE service = 'Checkout Service' AND usage_date BETWEEN '2024-12-01' AND '2024-12-31'
GROUP BY service, project;
- The MCP agent runs this query locally on the SQLite file synced to the laptop.
- Raw costs for this service are fetched instantly.
- Gemini’s natural language model synthesizes the insight: “The increase is driven by a new Spanner instance checkout-db-prod provisioned on the 15th."
No cloud queries. No expensive SaaS fees. Instant answers.
Security & Governance: Keeping Data Safe & Relevant
- Data Residency: All billing data resides only on the local machines of authorized users. No outbound data is sent to 3rd-party AI API endpoints, preserving confidentiality.
- Role-Based Access: The local MCP agent can implement filters based on user role or project membership, ensuring users only query relevant data.
- Auditability: Query logs remain local, avoiding centralized data exposure while enabling traceability on the user’s machine.
Comparative Analysis
| Feature | Direct BigQuery | BI Dashboards | Proposed Local Agent |
|----------------|---------------------------|----------------------------|-----------------------------------|
| Cost Per Query | High ($5+ / TB) | Med (Hidden Refresh Costs) | Zero ($0.00) |
| Speed | Variable (Queue times) | Slow (Load times) | Instant |
| Flexibility | High (Full SQL) | Low (Fixed Views) | High (Natural Language) |
| Accessibility | Low (Requires SQL skills) | Med (Requires Access) | High (Chat Interface) |
| Data Freshness | Real-time | Delayed | Daily Sync (Sufficient for FinOps) |
Final Thoughts
Combining Gemini CLI, MCP, and a smart decentralized sync strategy unlocks a new kind of FinOps, one where cost visibility is effortless, inexpensive, and immediate.
The cloud should never charge you for asking about your bills. By shifting the compute closer to users and blending in natural language AI, we finally solve the paradox of cloud cost analysis.

Top comments (0)