DEV Community: Sudharsana Viswanathan

From $0 to $35,000 in 6 Hours: How an API Leak and GCP Billing Lag Broke Our Startup

Sudharsana Viswanathan — Sat, 21 Mar 2026 01:09:48 +0000

1.5 Million Requests, 1 Leaked Key: How We Burned $35,000 on Gemini in 6 Hours

The "experimental phase" of a project is supposed to be the fun part. For us, as a dedicated AWS-native shop, we recently decided to branch out and test the Gemini 3.1 Pro Image model on Google Cloud Platform (GCP).

We did what every fast-moving team does: linked a business card, grabbed an API key, and started building. 20 days later, we had a $35,000 bill, a panicked CEO, and a very expensive lesson in how GCP’s default quotas and billing latency work.

If you are "just experimenting" with AI APIs, read this before you wake up to a five-figure surprise.

The "Perfect Storm" Timeline

The attack wasn't sophisticated, but it was relentless. Because we were experimenting, we hadn't yet applied our standard enterprise-grade security protocols to this new environment.

03:00 AM EST: An unrestricted API key is leaked (likely via a compromised development environment). An automated botnet begins hammering our Gemini 3.1 Pro Image endpoint.
08:00 AM EST: Our CEO receives an automated billing alert: $11,000.
08:15 AM EST: The team scrambles. We rotate the keys and disable project billing immediately. We honestly thought we stopped the bleeding at $11k.
11:00 AM EST: The "Billing Ghost" appears. Because GCP billing data lags by 3–5 hours, the dashboard continues to climb as the morning's requests are finally processed.

Final Damage: 1.5 million requests. $35,000 USD burned.

Why It Happened: The Default Quota Trap

Coming from the AWS ecosystem, we were shocked by how extensive the default quotas are in GCP. When you enable the Generative Language API, the default "Requests Per Minute" (RPM) is often set high enough to allow a botnet to drain a startup's bank account before the first cup of coffee is even brewed.

Combined with a high-limit business card, the system did exactly what it was told to do: Scale.

The "Never Again" Stack: Our 4-Step Mitigation Plan

Google Support has indicated they may consider a refund since this was a first-time incident, but they require a rigorous Remediation Plan. Here is the "Fort Knox" setup we’ve implemented to ensure this stays a one-time mistake.

1. Real-Time Observability (Datadog Integration)

Standard billing alerts are reactive—they tell you what you already spent. We needed to know what we are spending right now.

We integrated Datadog to monitor RPS (Requests Per Second) and TPS (Transactions Per Second) directly from our GCP logs.
The Kill-Switch Alert: If our Gemini request volume spikes 200% above the 10-minute moving average, Datadog triggers a PagerDuty alert immediately. We no longer wait for a billing email.

2. Moving to Service Accounts (IAM > API Keys)

"Naked" API keys are a massive liability. We are migrating all workloads to GCP Service Accounts.

Instead of a static string that can be leaked, we use short-lived tokens and IAM roles.
Local Dev: Developers must now use gcloud auth application-default login rather than generating a permanent, vulnerable key.

3. Hard Quotas & AI Studio Spend Limits

We realized "Unrestricted" is a dangerous default. We've tightened the screws on every point of entry:

Hard Quotas: We manually lowered our project-level quotas in the GCP Console to the bare minimum needed for production. If we hit the limit, the app returns a 429, but the bank account stays safe.
AI Studio Limits: For experimental keys, we now use Google AI Studio's spend limits, which offer a much more granular "kill switch" compared to project-wide billing.

4. The Proxy Layer

Every request now flows through an Internal API Gateway. This gateway acts as our final line of defense by validating user sessions and applying strict rate-limiting (e.g., 5 requests per minute per user) before it ever touches Google’s billable endpoints.

Final Thoughts

We are currently in the 3–5 day "waiting window" to see if Google will waive the $35,000. While the stress has been immense, the experience forced us to build a production-grade security layer for our AI experiments.

The takeaway? If you’re an AWS shop trying out GCP, don’t treat it like a sandbox. Restrict your keys, lower your quotas, and for the love of your runway, monitor your RPS.

Production AI Broke Because of a Model Deprecation — So I Built llm-model-deprecation

Sudharsana Viswanathan — Sat, 07 Mar 2026 16:08:24 +0000

Introduction

Have you ever deployed an AI app, only to find it suddenly broken because OpenAI or Gemini deprecated a model you were using? 😱

I did and it cost me hours of debugging, late-night panic, and a ton of lost productivity. Upgrading libraries when prod is down is no fun!

If you’re building apps on LLMs like OpenAI, Anthropic, or Gemini, model deprecations aren’t just annoying. they’re dangerous.

That’s why I created llm-model-deprecation, a lightweight Python library that alerts you before an LLM model disappears.

The Problem

LLM APIs evolve quickly:

OpenAI retires older GPT-3.5 models.

Gemini might tweak endpoint parameters without notice.

Anthropic occasionally removes older Claude versions.

If your production app depends on hardcoded model names, one day your API calls will start failing.

Common consequences:

Broken chatbots
Failed recommendation engines
Nightmarish debugging sessions

How I Solved It

Instead of checking docs manually or waiting for an unexpected failure, I automated the process:
✅ Track model deprecation status for OpenAI, Anthropic, Gemini

✅ Receive early warnings before a model is deprecated

✅ Integrate into CI/CD pipelines so your production app is always safe

Github Actions

Run the same check in GitHub Actions:

- name: Check LLM deprecations
  uses: techdevsynergy/llm-model-deprecation@v1.1.0
  with:
    fail-on-deprecated: true

Options: path (project root to scan), fail-on-deprecated, version

CLI

pip install llm-model-deprecation
llm-deprecation scan
llm-deprecation scan /path/to/project
llm-deprecation scan --fail-on-deprecated   # exit 1 if any found (for CI)

Library usage

from llm_deprecation import DeprecationChecker, DeprecationStatus

checker = DeprecationChecker()

# Check by model id (searches all providers)
checker.is_deprecated("gpt-3.5-turbo-0301")   # True
checker.is_retired("gpt-3.5-turbo-0301")     # True
checker.status("gpt-4")                       # DeprecationStatus.ACTIVE

# With provider for exact match
checker.get("claude-2.0", provider="anthropic")
# -> ModelInfo(provider='anthropic or gemini or openai', model_id='claude-2.0', status=..., replacement='...', ...)

# List deprecated models
for m in checker.list_deprecated(provider="openai"):
    print(m.model_id, m.status.value, m.replacement)

Data Refresh (Weekly)

I wrote web crawlers which runs every week to update/add model details. Registry is loaded from this URL; if unreachable (e.g. offline), the built-in registry in the library is used. My company Reps.ai supports the cost here to keep this stable

Call to Action

Try it today and never get caught by a model deprecation again:

🔗 Check out llm-model-deprecation on GitHub

If this helps you, star the repo ⭐ — it motivates me to keep updating the library with new LLMs as they launch.

Author

Sudharsana Viswanathan, Engineering Lead at Reps.ai

Never Get Caught by an LLM Deprecation Again: A Guide to llm-model-deprecation

Sudharsana Viswanathan — Sat, 07 Mar 2026 13:22:17 +0000

How to keep your apps on supported models with one Python package, a CLI, and a GitHub Action.

If you’ve ever had an integration break because OpenAI or Anthropic retired a model, you know the pain: sudden 404s, vague errors, and a scramble to find a replacement. Provider deprecation pages help, but they’re easy to miss when you’re shipping features. What you need is something that checks your code and config for deprecated or retired models and tells you exactly what to change.

That’s what llm-model-deprecation does. It gives you:

A Python library to query deprecation status and get replacement suggestions
A CLI to scan a project for deprecated model references (great for CI and cron)
A GitHub Action so every push or PR can fail if you’re still using retired models

The registry (OpenAI, Anthropic, Gemini, and more) is refreshed weekly, so you’re not relying on stale data. Here’s how to use it in detail.

Why this matters

LLM providers regularly:

Deprecate older models and set sunset dates
Retire models so they stop working entirely
Recommend newer, better, or cheaper replacements

If your app is hardcoded to gpt-3.5-turbo-0301 or claude-2.0, one day those APIs will stop working. Catching that in CI or in a weekly scan is much cheaper than finding out in production. This package centralizes that check so you can automate it.

What the package does

llm-model-deprecation:

Loads a registry of model deprecation data (from a community-maintained source, with a built-in fallback).
Lets you query any model by ID (and optionally provider) to see if it’s active, legacy, deprecated, or retired.
Scans your project files (Python, JSON, YAML, env files, TypeScript, etc.) for known deprecated model IDs and reports what it finds.

The data is updated weekly, so the library and CLI stay in sync with provider announcements. You don’t have to maintain the list yourself.

Installation

From PyPI:

pip install llm-model-deprecation

If you want the library to load the registry from a URL using requests (slightly more robust than the default stdlib), use the optional extra:

pip install "llm-model-deprecation[fetch]"

That’s it. No config files or API keys required for the checker or the CLI.

Using the library

After installing, you get a DeprecationChecker that’s ready to use. It loads the registry automatically (online first, then built-in fallback if offline).

Basic checks

from llm_deprecation import DeprecationChecker, DeprecationStatus

checker = DeprecationChecker()

# Is this model deprecated or retired?
checker.is_deprecated("gpt-3.5-turbo-0301")   # True
checker.is_retired("gpt-3.5-turbo-0301")      # True

# What’s the status?
checker.status("gpt-4")   # DeprecationStatus.ACTIVE
checker.status("gpt-3.5-turbo-0613")   # DeprecationStatus.RETIRED

The checker searches all providers when you pass only a model ID. So you don’t need to remember whether a name is OpenAI or Anthropic.

Getting full details

When you need the full record (replacement suggestion, sunset date, notes):

info = checker.get("claude-2.0", provider="anthropic")
if info:
    print(info.model_id)       # claude-2.0
    print(info.status)        # e.g. RETIRED
    print(info.replacement)   # e.g. claude-opus-4-6
    print(info.sunset_date)    # when it was retired

Use provider="openai", provider="anthropic", or provider="gemini" when you want to pin the provider; otherwise the first matching model ID across providers is returned.

Listing all deprecated models

To iterate over everything that’s deprecated or retired (optionally per provider):

for m in checker.list_deprecated(provider="openai"):
    print(f"{m.model_id} → {m.status.value}, replace with: {m.replacement}")

Handy for dashboards, docs, or internal tooling.

Understanding status values

The registry uses four statuses:

Your code should treat deprecated and retired as “must fix”; legacy as “plan to migrate.”

Adding or overriding models

If you have internal or beta models, or want to override the registry, you can register them:

from datetime import date
from llm_deprecation import DeprecationChecker
from llm_deprecation.models import ModelInfo, DeprecationStatus

checker = DeprecationChecker()
checker.register(ModelInfo(
    provider="openai",
    model_id="gpt-4-old",
    status=DeprecationStatus.DEPRECATED,
    sunset_date=date(2026, 1, 1),
    replacement="gpt-4o",
    notes="Internal alias; migrate by Q1 2026.",
))

# Now your code and the CLI will see this model
assert checker.is_deprecated("gpt-4-old")

The scanner (CLI and library) will pick up these models when it searches the project.

Using the CLI

The llm-deprecation command scans a directory for references to deprecated or retired models. It’s the same logic as the library, but you can run it from the shell, in CI, or from a cron job.

Basic scan

From the project root:

llm-deprecation scan

Or target a specific path:

llm-deprecation scan /path/to/your/app

Example output:

Scanning project...
⚠ openai:gpt-3.5-turbo-0301 → retired
⚠ anthropic:claude-2.0 → retired
⚠ openai:text-embedding-ada-002 → deprecated soon

Each line is a provider:model_id plus whether it’s “deprecated soon” or “retired.” No noise—just what you need to fix.

Options

--fail-on-deprecated
Exit with code 1 if any deprecated or retired models are found. Use this in CI so the pipeline fails until the codebase is updated.
-q / --quiet
Skip the “Scanning project...” line. Useful when you’re parsing output or piping.

Examples:

# CI-style: fail the job if anything is deprecated or retired
llm-deprecation scan --fail-on-deprecated

# Quiet run (e.g. in a script)
llm-deprecation scan -q

What gets scanned

The CLI looks at common code and config file types, including:

Code: .py, .js, .ts, .tsx, .jsx, .mjs, .cjs
Config: .json, .yaml, .yml, .env, .toml, .ini, .cfg
Docs: .md, .rst, .txt

It skips directories like .git, node_modules, .venv, venv, __pycache__, and dist, so you don’t get false positives from dependencies or build artifacts.

Because the model data is refreshed weekly, running the CLI regularly (e.g. in CI or cron) means you see new deprecations soon after they’re added to the registry.

Using the GitHub Action

You can run the same check on every push or pull request with the official action.

Minimal workflow

name: CI
on: [push, pull_request]

jobs:
  llm-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check LLM deprecations
        uses: techdevsynergy/llm-model-deprecation@v1.1.0
        with:
          fail-on-deprecated: true

If the scan finds any deprecated or retired models, the step fails and the run is red. Fix the reported models, push again, and the check passes.

Action inputs

Example with options:

- name: Check LLM deprecations
  uses: techdevsynergy/llm-model-deprecation@v1.1.0
  with:
    path: "."
    fail-on-deprecated: "true"
    version: "1.1.0"

Where to add it

Single repo: Add the step to your main CI workflow (e.g. on push and pull_request).
Monorepo: Use path to scan only the app that uses LLMs (e.g. path: "services/ai").
Release gates: Run the same job on the main branch or on release tags so you never ship with deprecated models.

The action installs llm-model-deprecation from PyPI and runs llm-deprecation scan under the hood. Because the registry is updated weekly, your CI automatically benefits from new deprecation data without changing the workflow.

Data source and weekly updates

The package does not ship with a static, frozen list. It uses:

A default URL that serves a community-maintained registry (e.g. techdevsynergy/llm-deprecation-data).
A built-in fallback in the library for when the network is unavailable (e.g. air-gapped or CI with no outbound access).

That registry is refreshed weekly with the latest deprecations and retirements from provider docs (OpenAI, Anthropic, Google Gemini, etc.). So:

Library: Each time you create a DeprecationChecker() or run the CLI, it can load the latest data (or the fallback).
CI / Action: Each run can see the current week’s data, so new deprecations show up in your pipeline shortly after they’re published.

You don’t need to configure anything for this; it’s the default behavior.

Putting it together: recommended workflow

Install
pip install llm-model-deprecation (or with [fetch] if you prefer).
In code
When choosing or validating a model ID (e.g. from config or user input), call checker.is_deprecated(model_id) or checker.get(model_id) and warn or block if deprecated/retired.
In CI
Add the GitHub Action with fail-on-deprecated: true so no PR or push can merge with deprecated/retired model references.
Optionally in cron
Run llm-deprecation scan /path/to/app --fail-on-deprecated on a schedule and send the output to Slack or email if something appears.
Stay updated
Upgrade the package occasionally so you get the latest CLI and library behavior; the registry itself is refreshed weekly on the server side.

Summary

llm-model-deprecation gives you a Python API, a CLI, and a GitHub Action to keep your app off deprecated and retired LLM models.
The registry is refreshed weekly, so you’re not relying on outdated data.
Use the library to check model IDs in code; use the CLI for ad-hoc or scripted scans; use the Action to enforce the check on every push or PR.

You can install it with pip install llm-model-deprecation, add one step to your workflow, and stop worrying about surprise model retirements.

Links