DEV Community

Akash Melavanki
Akash Melavanki

Posted on

How I built a real-time LLM "Kill-Switch" for Vercel Edge using Atomic Redis

Last week, the Axios supply chain attack compromised over 100 million weekly downloads. A week before that, it was LiteLLM.

In both cases, the goal was simple: Exfiltrate API keys. As developers, we are taught to rotate our keys immediately. But there’s a massive gap in that advice. If an attacker gets your OpenAI key at 2 AM, they don't wait for you to wake up. They loop your endpoints, drain your credits, and leave you with a $1,000+ bill by sunrise.

This is what OWASP calls LLM10:2025 – Unbounded Consumption (or "Denial of Wallet"). I spent the last two weeks building a way to stop it at the Edge.

The Problem: Why Rate Limiting Fails LLMs
Standard rate-limiting (e.g., 10 requests per minute) is useless for LLMs.

Request 1: "Hi" (10 tokens) — Cost: $0.0001

Request 2: "Summarize this 50-page PDF" (30,000 tokens) — Cost: $0.45

An attacker doesn't need a high volume of requests to ruin you; they just need expensive requests. We need Budget Limiting, not Rate Limiting.

The Technical Challenge: The Stateless Race Condition
I’m building this for Next.js on Vercel Edge.

Vercel Edge functions are stateless. If you try to track a user's spend in a local variable, it vanishes. If you use a standard database, the latency kills your UX.

But the real "final boss" is the Race Condition.

Imagine a user fires 10 concurrent requests.

Instance A checks the budget: "Remaining: $0.05. Proceed."

Instance B checks the budget: "Remaining: $0.05. Proceed."

Both fire $1.00 requests.

Result: You are now -$1.95 in the hole.

The Solution: Atomic Lua Scripts on Redis
To solve this, I moved the logic into an Atomic Lua Script on Upstash Redis. Instead of "Check then Update" (two steps), the logic happens in one single, uninterruptible step inside the database memory.

-- The "Kill-Switch" Logic
local key = KEYS[1] -- user_budget_key
local limit = tonumber(ARGV[1]) -- e.g., 1.00 USD
local cost = tonumber(ARGV[2]) -- estimated cost
local current = tonumber(redis.call('GET', key) or "0")

if current + cost > limit then
  return 0 -- BLOCK
end

redis.call('INCRBYFLOAT', key, cost)
return 1 -- ALLOW
Enter fullscreen mode Exit fullscreen mode

This runs in ~10ms. If Instance A and B hit the script at the exact same millisecond, Redis queues them. One passes, the second fails. No race condition. No $1,000 surprises.

The Benchmark: A Controlled Stress Test
To quantify the risk, I ran a simulated Denial of Wallet (DWL) attack against a standard Next.js API route.

The Setup:

Attacker: A simple recursive script firing concurrent requests with high-token payloads (800+ tokens/request).

Target: A GPT-4o endpoint.

The Result (Unprotected): The script ran for 47 seconds. Total simulated cost reached $847.00 before manual intervention.

The Result (Thskyshield): Using the same script, the governance layer triggered a 429 (Too Many Requests) at the 3rd call. Total spend: $0.08.

Watch the Live Simulation →

The "Two-Phase" Protocol
The hardest part was handling the fact that you don't know the exact cost of an LLM call until it's finished. I settled on a two-phase approach:

Phase 1 (Pre-flight): Check the budget based on the max possible tokens. "Lock" that amount.

Phase 2 (Post-flight): Once the LLM returns, reconcile the actual usage and "Refund" the difference to the user's budget.

Conclusion
Supply chain attacks like the Axios one are the "new normal." We can't stop every key from being stolen, but we can stop a stolen key from being a business-ending event.

I’ve open-sourced the SDK for this under Thskyshield. If you're building with Next.js and want to stop worrying about your OpenAI bill, it's free for founders.

SDK: @thsky-21/thskyshield

Website: thskyshield.com

Would love to hear how others are handling "Denial of Wallet" risks. Are you just relying on OpenAI's hard limits, or are you building your own governance layer?

Top comments (0)