DEV Community

Cover image for From $0 to $35,000 in 6 Hours: How an API Leak and GCP Billing Lag Broke Our Startup
Sudharsana Viswanathan
Sudharsana Viswanathan

Posted on

From $0 to $35,000 in 6 Hours: How an API Leak and GCP Billing Lag Broke Our Startup

1.5 Million Requests, 1 Leaked Key: How We Burned $35,000 on Gemini in 6 Hours

The "experimental phase" of a project is supposed to be the fun part. For us, as a dedicated AWS-native shop, we recently decided to branch out and test the Gemini 3.1 Pro Image model on Google Cloud Platform (GCP).

We did what every fast-moving team does: linked a business card, grabbed an API key, and started building. 20 days later, we had a $35,000 bill, a panicked CEO, and a very expensive lesson in how GCP’s default quotas and billing latency work.

If you are "just experimenting" with AI APIs, read this before you wake up to a five-figure surprise.


The "Perfect Storm" Timeline


The attack wasn't sophisticated, but it was relentless. Because we were experimenting, we hadn't yet applied our standard enterprise-grade security protocols to this new environment.

  • 03:00 AM EST: An unrestricted API key is leaked (likely via a compromised development environment). An automated botnet begins hammering our Gemini 3.1 Pro Image endpoint.
  • 08:00 AM EST: Our CEO receives an automated billing alert: $11,000.
  • 08:15 AM EST: The team scrambles. We rotate the keys and disable project billing immediately. We honestly thought we stopped the bleeding at $11k.
  • 11:00 AM EST: The "Billing Ghost" appears. Because GCP billing data lags by 3–5 hours, the dashboard continues to climb as the morning's requests are finally processed.

Final Damage: 1.5 million requests. $35,000 USD burned.

Why It Happened: The Default Quota Trap


Coming from the AWS ecosystem, we were shocked by how extensive the default quotas are in GCP. When you enable the Generative Language API, the default "Requests Per Minute" (RPM) is often set high enough to allow a botnet to drain a startup's bank account before the first cup of coffee is even brewed.

Combined with a high-limit business card, the system did exactly what it was told to do: Scale.


The "Never Again" Stack: Our 4-Step Mitigation Plan


Google Support has indicated they may consider a refund since this was a first-time incident, but they require a rigorous Remediation Plan. Here is the "Fort Knox" setup we’ve implemented to ensure this stays a one-time mistake.

1. Real-Time Observability (Datadog Integration)


Standard billing alerts are reactive—they tell you what you already spent. We needed to know what we are spending right now.


  • We integrated Datadog to monitor RPS (Requests Per Second) and TPS (Transactions Per Second) directly from our GCP logs.
  • The Kill-Switch Alert: If our Gemini request volume spikes 200% above the 10-minute moving average, Datadog triggers a PagerDuty alert immediately. We no longer wait for a billing email.

2. Moving to Service Accounts (IAM > API Keys)


"Naked" API keys are a massive liability. We are migrating all workloads to GCP Service Accounts.


  • Instead of a static string that can be leaked, we use short-lived tokens and IAM roles.
  • Local Dev: Developers must now use gcloud auth application-default login rather than generating a permanent, vulnerable key.

3. Hard Quotas & AI Studio Spend Limits


We realized "Unrestricted" is a dangerous default. We've tightened the screws on every point of entry:


  • Hard Quotas: We manually lowered our project-level quotas in the GCP Console to the bare minimum needed for production. If we hit the limit, the app returns a 429, but the bank account stays safe.
  • AI Studio Limits: For experimental keys, we now use Google AI Studio's spend limits, which offer a much more granular "kill switch" compared to project-wide billing.

4. The Proxy Layer


Every request now flows through an Internal API Gateway. This gateway acts as our final line of defense by validating user sessions and applying strict rate-limiting (e.g., 5 requests per minute per user) before it ever touches Google’s billable endpoints.


Final Thoughts


We are currently in the 3–5 day "waiting window" to see if Google will waive the $35,000. While the stress has been immense, the experience forced us to build a production-grade security layer for our AI experiments.

The takeaway? If you’re an AWS shop trying out GCP, don’t treat it like a sandbox. Restrict your keys, lower your quotas, and for the love of your runway, monitor your RPS.

Top comments (0)