<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akash Melavanki</title>
    <description>The latest articles on DEV Community by Akash Melavanki (@thsky21).</description>
    <link>https://dev.to/thsky21</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866017%2Fbe85f26b-1253-41a7-a3f8-a912d5d47e89.png</url>
      <title>DEV Community: Akash Melavanki</title>
      <link>https://dev.to/thsky21</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thsky21"/>
    <language>en</language>
    <item>
      <title>Your AI Agent Just Spent $200 on a $2 Task. Here's Why Nobody Warned You</title>
      <dc:creator>Akash Melavanki</dc:creator>
      <pubDate>Tue, 02 Jun 2026 14:39:01 +0000</pubDate>
      <link>https://dev.to/thsky21/your-ai-agent-just-spent-200-on-a-2-task-heres-why-nobody-warned-you-543k</link>
      <guid>https://dev.to/thsky21/your-ai-agent-just-spent-200-on-a-2-task-heres-why-nobody-warned-you-543k</guid>
      <description>&lt;p&gt;&lt;em&gt;And why the tools we have right now aren't built for this.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I want to talk about something that's quietly becoming a real problem as more people ship autonomous agents into production — and nobody's really naming it clearly.&lt;/p&gt;

&lt;p&gt;We're not talking about prompt injection or model hallucinations here. We're talking about something more boring and more expensive: &lt;strong&gt;your agent running off the rails and you not knowing until the bill lands.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem Nobody Warned You About
&lt;/h2&gt;

&lt;p&gt;Here's how it usually plays out.&lt;/p&gt;

&lt;p&gt;You build an agent. It works perfectly in testing. You ship it. A few days later you open your OpenAI billing dashboard and something is... off. One run cost $47. A $2 research task. You dig in, and somewhere in the logs you find it — the agent hit a bad tool call, got a weird response, and started retrying. 80 times. Nobody stopped it.&lt;/p&gt;

&lt;p&gt;That's not a bug in your LLM. That's not a hallucination. That's just &lt;strong&gt;what autonomous agents do when there's no guardrail.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The problem is structural. When you write &lt;code&gt;while (!done)&lt;/code&gt; and hand control to an LLM, you're trusting a non-deterministic system to know when to stop. Sometimes it doesn't. And unlike a crashed server or a 500 error — a runaway agent &lt;em&gt;keeps working&lt;/em&gt;. It looks healthy. It's just spending.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why "Just Set max_iterations=10" Doesn't Cut It
&lt;/h2&gt;

&lt;p&gt;I've seen this answer come up a lot. It makes sense at first glance — cap the loops, problem solved.&lt;/p&gt;

&lt;p&gt;But here's what it doesn't cover:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Cost isn't uniform across iterations.&lt;/strong&gt;&lt;br&gt;
One step might cost $0.003. Another might cost $3.00 depending on the model, the prompt size, and what tools were called. Iteration count tells you nothing about money.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Loops don't look like loops.&lt;/strong&gt;&lt;br&gt;
A stuck agent doesn't always repeat the exact same call. It might slightly permute the prompt each time — same semantic intent, different tokens. &lt;code&gt;max_iterations&lt;/code&gt; catches the obvious case. Loop &lt;em&gt;signature detection&lt;/em&gt; catches the real ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Multiple services, multiple agents.&lt;/strong&gt;&lt;br&gt;
The moment you have more than one agent running — across services, across team members, across environments — where does your &lt;code&gt;max_iterations&lt;/code&gt; config live? In a .env file in each repo? Good luck keeping those consistent when you push a hotfix at 2am.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Nobody can see what's happening.&lt;/strong&gt;&lt;br&gt;
Your &lt;code&gt;console.log&lt;/code&gt; statements don't survive a process restart. Your finance lead can't query them. Your teammate who shipped a different agent can't see if there's a pattern emerging across runs.&lt;/p&gt;

&lt;p&gt;The iteration cap is duct tape. It works for one developer protecting one script in a weekend project. It doesn't work for a team shipping agents to real users.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Deeper Issue: Agents Are Actors, Not Functions
&lt;/h2&gt;

&lt;p&gt;This is the thing that took me a while to fully internalize.&lt;/p&gt;

&lt;p&gt;When you call a normal function, it runs, it returns, it's done. You have deterministic cost. Predictable behavior. Easy to reason about.&lt;/p&gt;

&lt;p&gt;An autonomous agent is different. It's not a function call — it's a &lt;em&gt;process&lt;/em&gt; that makes decisions. It decides what tools to call, in what order, how many times. It decides when it thinks the task is done. You set the objective. The agent figures out the path.&lt;/p&gt;

&lt;p&gt;That's the whole point of agentic AI. That's why it's powerful.&lt;/p&gt;

&lt;p&gt;But it also means the cost model is completely different. You're not paying per-call anymore. You're paying for a sequence of decisions you didn't make. And if one of those decisions is wrong — a bad retry strategy, a hallucinated tool call, a reasoning loop — you're paying for that too.&lt;/p&gt;

&lt;p&gt;We built observability and governance for the old world — where you call an API, you know the cost upfront, and it's done. We haven't fully built it for this new world yet.&lt;/p&gt;


&lt;h2&gt;
  
  
  What Actually Needs to Exist
&lt;/h2&gt;

&lt;p&gt;When I started thinking about this seriously, I realized there's a specific set of things that need to be true for agents to be safe in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hard budget ceilings per run.&lt;/strong&gt; Not soft warnings. Not "we'll alert you". The agent &lt;em&gt;stops&lt;/em&gt; when it hits the ceiling. With a partial result. Before more money is spent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Loop detection at the semantic level.&lt;/strong&gt; Not just counting iterations — actually detecting when the agent is spinning on the same reasoning pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A kill switch that lives outside your code.&lt;/strong&gt; If your agent is running in service-A, you shouldn't have to redeploy service-A to stop a runaway run. The kill switch should be callable from a dashboard, an API, a webhook — anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent audit logs.&lt;/strong&gt; Every step, every tool call, every decision — logged somewhere that doesn't disappear when the process exits. Not for debugging. For &lt;em&gt;governance&lt;/em&gt;. So you can ask "what did this agent actually do, and why?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy that's centralized.&lt;/strong&gt; Not per-service config files. One place where you define the rules, and every agent your team ships inherits them.&lt;/p&gt;

&lt;p&gt;That last one is the key insight. The difference between "add a budget library to your project" and "have actual governance" is whether the policy &lt;em&gt;lives above the code&lt;/em&gt; or inside it.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Analogy That Made It Click for Me
&lt;/h2&gt;

&lt;p&gt;Think about how Datadog relates to your application metrics.&lt;/p&gt;

&lt;p&gt;You could instrument your app with Prometheus locally and write metrics to stdout. That works for one app. But the moment you have multiple services, you need a layer &lt;em&gt;above&lt;/em&gt; your code — a place where metrics from all of them converge, where you set alerts, where your whole team can see what's happening.&lt;/p&gt;

&lt;p&gt;That's the shape of what agent governance needs to look like. Not a library you add to each project. A control plane that sits above all of them.&lt;/p&gt;

&lt;p&gt;Library runs in your process. Stops when your process stops. Lives in one repo.&lt;/p&gt;

&lt;p&gt;Control plane runs above your code. Persists across deploys. Spans every agent your team ships.&lt;/p&gt;


&lt;h2&gt;
  
  
  I Built Something in This Space — Looking for Design Partners
&lt;/h2&gt;

&lt;p&gt;I've been working on exactly this problem. It's called &lt;strong&gt;Thskyshield&lt;/strong&gt; — a runtime governance layer for autonomous agents.&lt;/p&gt;

&lt;p&gt;The idea is simple: you wrap your agent loop with three SDK calls — &lt;code&gt;beginRun&lt;/code&gt;, &lt;code&gt;beforeStep&lt;/code&gt;, &lt;code&gt;afterStep&lt;/code&gt; — and the control plane handles the rest. Hard budget ceilings, loop detection, kill switch, step-by-step audit trail. Sub-10ms enforcement (so it doesn't add latency to your actual LLM calls). Works with LangGraph, CrewAI, OpenAI Agents SDK, or whatever you're using.&lt;/p&gt;

&lt;p&gt;The policy lives in your dashboard, not in your code. Change a limit without touching a deploy. See what every agent your team shipped actually did.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;shield&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;beginRun&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;research-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;budgetLimitUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;iterationLimit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;loopThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;beforeStep&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;estimatedTokens&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callLLMAndTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;afterStep&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;toolResult&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;ShieldKilledError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Stopped: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. Spent: $&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;spent&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five lines. Drop into any agent loop.&lt;/p&gt;

&lt;p&gt;MVP is live. I'm actively working with a small group of teams to shape the agent SDK — specifically want to talk to people who are shipping real agents and hitting real problems. &lt;strong&gt;First five design partners get it free forever.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're building in this space and this problem sounds familiar — I'd genuinely love to talk. Drop a comment or reach out at &lt;a href="https://thskyshield.com/contact" rel="noopener noreferrer"&gt;thskyshield.com/contact&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;The agent era is real. The tooling to govern it safely is still catching up. That gap is what I'm working on.&lt;/p&gt;

&lt;p&gt;If you're building agents in production and haven't thought about this yet — now's the time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;#ai #agents #llm #devops #opensource #typescript #buildinpublic #webdev&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>langchain</category>
      <category>crewai</category>
      <category>ai</category>
    </item>
    <item>
      <title>Vercel got hacked. Your API keys rotated. You're still not safe.</title>
      <dc:creator>Akash Melavanki</dc:creator>
      <pubDate>Tue, 21 Apr 2026 16:44:56 +0000</pubDate>
      <link>https://dev.to/thsky21/vercel-got-hacked-your-api-keys-rotated-youre-still-not-safe-361c</link>
      <guid>https://dev.to/thsky21/vercel-got-hacked-your-api-keys-rotated-youre-still-not-safe-361c</guid>
      <description>&lt;p&gt;I host Thskyshield on Vercel.&lt;/p&gt;

&lt;p&gt;So when I woke up to the news that Vercel had been breached — internal systems compromised, customer environment variables exposed, data allegedly being sold on BreachForums for $2 million — I didn't panic. I rotated my keys immediately, like everyone else.&lt;/p&gt;

&lt;p&gt;And then I sat with a question that I don't think enough developers are asking:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens in the window between a key being stolen and you rotating it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That question is literally why I built what I built.&lt;/p&gt;




&lt;h2&gt;
  
  
  What actually happened at Vercel
&lt;/h2&gt;

&lt;p&gt;Let me explain the attack chain, because it's more interesting than "Vercel got hacked."&lt;/p&gt;

&lt;p&gt;A Context.ai employee got hit with Lumma Stealer malware in February 2026. Lumma is an infostealer — it quietly harvests credentials, OAuth tokens, session cookies. The attacker sat on those credentials for weeks.&lt;/p&gt;

&lt;p&gt;Then they used the stolen OAuth token to access Vercel's Google Workspace. That gave them access to Vercel's internal environments. And in those environments sat thousands of customer environment variables — API keys, database credentials, signing tokens — that weren't marked as "sensitive."&lt;/p&gt;

&lt;p&gt;One employee. One third-party tool. One overly permissive OAuth grant. And suddenly an attacker has the keys to a significant chunk of the web's developer infrastructure.&lt;/p&gt;

&lt;p&gt;This is what a supply chain attack looks like in 2026. It's not brute force. It's patient, precise, and automated.&lt;/p&gt;




&lt;h2&gt;
  
  
  The standard advice is incomplete
&lt;/h2&gt;

&lt;p&gt;"Rotate your keys immediately."&lt;/p&gt;

&lt;p&gt;Yes. Obviously. Do that.&lt;/p&gt;

&lt;p&gt;But here's what nobody's talking about: key rotation is reactive. You rotate after you know you're compromised. The attacker who stole your key at 2 AM on a Sunday doesn't wait for Monday morning. They act the moment they have it.&lt;/p&gt;

&lt;p&gt;With AI-powered automation, that window between theft and damage is now measured in seconds, not hours. Google's Threat Intelligence team found that the time between initial access and full breach has collapsed from 8 hours in 2022 to 22 seconds in 2025. The attacker doesn't need you to be asleep. They're done before you finish reading the breach notification email.&lt;/p&gt;

&lt;p&gt;So yes, rotate your keys. But also ask: if this key gets stolen tonight, what is the maximum damage the attacker can do with it?&lt;/p&gt;




&lt;h2&gt;
  
  
  What a stolen LLM API key actually enables
&lt;/h2&gt;

&lt;p&gt;Here's where I want to be honest about scope, because I think most people only think about one dimension of this.&lt;/p&gt;

&lt;p&gt;A stolen OpenAI, Anthropic, or Gemini API key gives an attacker several options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Denial of Wallet&lt;/strong&gt; — loop your chatbot endpoint with high-token payloads. Max out your billing. Leave you with a $5,000 invoice by sunrise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Data exfiltration&lt;/strong&gt; — if your LLM calls include user data, system prompts, or sensitive context, the attacker can extract that by replaying your own endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Content generation at your cost&lt;/strong&gt; — use your key as a free compute resource. Generate content, run agents, build products — all billed to you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Prompt injection into your users&lt;/strong&gt; — if the attacker can make calls to your endpoint, they may be able to manipulate the responses your actual users see.&lt;/p&gt;

&lt;p&gt;I want to be clear: I'm only solving one of these. Thskyshield is a financial kill-switch. It stops the billing damage. It does not stop data exfiltration. It doesn't block malicious prompt injection. There are other tools for those layers.&lt;/p&gt;

&lt;p&gt;What I believe is: every attack that costs money gets stopped at a known ceiling. If your OpenAI key is stolen and the attacker tries a Denial of Wallet attack, they can only drain up to whatever daily limit you set per user. Not $5,000. Not $500. Whatever you decided.&lt;/p&gt;

&lt;p&gt;That ceiling exists even if everything else failed. Even if Vercel leaked your key. Even if the attacker has full access. The financial blast radius is bounded.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI makes this more urgent, not less
&lt;/h2&gt;

&lt;p&gt;I've had this instinct for a while, and the data is starting to confirm it.&lt;/p&gt;

&lt;p&gt;Attacks are getting faster because the automation is getting smarter. The Lumma Stealer that hit the Context.ai employee — that's not a human manually harvesting credentials. That's a piece of software running autonomously, finding credentials, exfiltrating them, and handing them off to an operator in real time.&lt;/p&gt;

&lt;p&gt;The same automation that makes AI products useful makes AI-powered attacks cheap to run at scale. A Denial of Wallet attack used to require someone sitting at a keyboard, writing a loop script, running it manually. Today it's a five-line agentic task: "Loop this endpoint until the budget hits $X or you get a 429."&lt;/p&gt;

&lt;p&gt;The attack surface for every AI product is a financial endpoint. Every chatbot, every AI feature, every LLM-powered tool has a cost function. And attackers are starting to understand that better than most developers do.&lt;/p&gt;




&lt;h2&gt;
  
  
  The thing I keep coming back to
&lt;/h2&gt;

&lt;p&gt;The Vercel breach is not an outlier. LiteLLM in March. Axios in March. Context.ai in February. Vercel in April.&lt;/p&gt;

&lt;p&gt;Every one of these is a supply chain attack. Every one of them exposed developer credentials. Every one of them happened through trusted infrastructure — things developers rely on every day without thinking twice.&lt;/p&gt;

&lt;p&gt;You can't stop the breaches. You can't guarantee your keys won't be stolen. What you can control is what happens after.&lt;/p&gt;

&lt;p&gt;Rotate your keys — yes. But also put a ceiling on what a stolen key can do to your business.&lt;/p&gt;

&lt;p&gt;That's the layer most developers don't have yet.&lt;/p&gt;




&lt;p&gt;If you're building with LLM APIs and you want a hard limit on what a stolen key can drain: thskyshield.com&lt;/p&gt;

&lt;p&gt;Or if you want to watch a simulated Denial of Wallet attack fire in real time: thskyshield.com/simulator&lt;/p&gt;

&lt;p&gt;Curious what others are doing about this. Are you relying on provider-side limits? Rolling your own governance? Or just hoping it doesn't happen to you?&lt;/p&gt;

</description>
      <category>security</category>
      <category>saas</category>
      <category>devops</category>
      <category>ai</category>
    </item>
    <item>
      <title>Rate limiting your LLM API is useless. Here's what actually protects you.</title>
      <dc:creator>Akash Melavanki</dc:creator>
      <pubDate>Tue, 14 Apr 2026 13:32:34 +0000</pubDate>
      <link>https://dev.to/thsky21/rate-limiting-your-llm-api-is-useless-heres-what-actually-protects-you-mmc</link>
      <guid>https://dev.to/thsky21/rate-limiting-your-llm-api-is-useless-heres-what-actually-protects-you-mmc</guid>
      <description>&lt;p&gt;Last month, the LiteLLM supply chain attack exposed API keys across thousands of developer projects.&lt;/p&gt;

&lt;p&gt;The standard advice: rotate your keys immediately.&lt;/p&gt;

&lt;p&gt;Here's what nobody tells you after that: a rotated key doesn't protect you from the next attack. Rate limiting doesn't either. I'll show you why — and what actually works.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with rate limiting LLMs
&lt;/h2&gt;

&lt;p&gt;Rate limiting assumes all requests cost roughly the same. For traditional APIs, that's true.&lt;/p&gt;

&lt;p&gt;For LLMs, it's completely wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request 1: "Hi"
→ ~10 tokens → cost: $0.0001

Request 2: "Summarize this 50-page PDF"
→ ~30,000 tokens → cost: $0.45
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An attacker doesn't need high volume. They just need expensive requests. 10 requests per minute means nothing when each request costs $0.45.&lt;/p&gt;

&lt;p&gt;What you actually need is &lt;strong&gt;budget limiting&lt;/strong&gt; — enforcing a maximum dollar spend per user, per day, in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The race condition nobody talks about
&lt;/h2&gt;

&lt;p&gt;OK, so you decide to track spend in Redis. Simple, right?&lt;/p&gt;

&lt;p&gt;Wrong. Here's what happens at scale.&lt;/p&gt;

&lt;p&gt;Your app receives 10 concurrent requests from the same user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Instance A reads budget: $0.05 remaining. Proceeds.
Instance B reads budget: $0.05 remaining. Proceeds.
Instance C reads budget: $0.05 remaining. Proceeds.
...all 10 instances read $0.05 and proceed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All 10 fire $1.00 LLM requests. Your user's budget was $1.00. You just spent $10.00.&lt;/p&gt;

&lt;p&gt;This is the race condition. Standard Redis GET + SET cannot solve it — there's always a gap between reading and writing where another instance sneaks through.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fix: atomic Lua scripts
&lt;/h2&gt;

&lt;p&gt;The solution is to move the entire check-and-update logic into a single atomic operation inside Redis. Lua scripts on Redis run as one uninterruptible step — no interleaving, no race condition possible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- runs atomically inside Redis — no race condition possible&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="c1"&gt;-- BLOCK&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'INCRBYFLOAT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;-- ALLOW&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs in ~10ms on edge infrastructure. Instance A and Instance B hitting this at the exact same millisecond? Redis queues them. One passes, one fails. Budget enforced. Mathematically consistent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The two-phase protocol
&lt;/h2&gt;

&lt;p&gt;There's one more problem: you don't know the exact cost of an LLM call until it finishes.&lt;/p&gt;

&lt;p&gt;The solution is a two-phase commit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — pre-flight (before the LLM call)&lt;/strong&gt;&lt;br&gt;
Estimate the cost based on max possible tokens. Reserve that amount atomically. If budget exceeded, return 429 immediately — the LLM never even gets called.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — reconciliation (after the LLM call)&lt;/strong&gt;&lt;br&gt;
OpenAI returns actual token usage. Reconcile: release the estimate, apply the real cost. If the estimate was too high, refund the difference back to the user's budget.&lt;/p&gt;

&lt;p&gt;This means your budget enforcement is tight even under worst-case conditions.&lt;/p&gt;


&lt;h2&gt;
  
  
  Benchmark: controlled Denial of Wallet attack
&lt;/h2&gt;

&lt;p&gt;I ran a simulated DoW attack against a standard GPT-4o endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt; Recursive script, concurrent requests, 800+ token payloads per request.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Unprotected&lt;/th&gt;
&lt;th&gt;With atomic governance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Duration&lt;/td&gt;
&lt;td&gt;47 seconds&lt;/td&gt;
&lt;td&gt;Stopped at request 3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total spend&lt;/td&gt;
&lt;td&gt;$847.00&lt;/td&gt;
&lt;td&gt;$0.08&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intervention&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The governance layer fired a 429 at the third request. The attacker's loop never got traction.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why this matters after supply chain attacks
&lt;/h2&gt;

&lt;p&gt;Here's the thing about the LiteLLM and Axios breaches: rotating your API key is the right move, but it's reactive. The damage happens in the window between the breach and you waking up.&lt;/p&gt;

&lt;p&gt;Budget governance is your last line of defense. Even with a stolen key, the attacker can only drain up to the limit you set. No $1,000 surprise at sunrise.&lt;/p&gt;


&lt;h2&gt;
  
  
  The implementation
&lt;/h2&gt;

&lt;p&gt;I built this into an open-source SDK called Thskyshield. Two lines to wrap any LLM call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;requestId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;shield&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;externalUserId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;estimatedTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Budget exceeded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// ...your LLM call here...&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;shield&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;externalUserId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK handles the atomic reservation, the two-phase reconciliation, and the 429 response automatically. Works on Vercel Edge, Cloudflare Workers, or any Node.js environment.&lt;/p&gt;

&lt;p&gt;→ SDK: &lt;code&gt;npm install @thsky-21/thskyshield&lt;/code&gt;&lt;br&gt;
→ Live attack simulation: thskyshield.com/simulator&lt;/p&gt;




&lt;p&gt;What are you actually using to cap LLM spend right now? Are you relying on OpenAI's hard limits, or have you built something custom? Would genuinely like to know what's working.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
      <category>security</category>
    </item>
    <item>
      <title>How I built a real-time LLM "Kill-Switch" for Vercel Edge using Atomic Redis</title>
      <dc:creator>Akash Melavanki</dc:creator>
      <pubDate>Tue, 07 Apr 2026 14:39:17 +0000</pubDate>
      <link>https://dev.to/thsky21/how-i-built-a-real-time-llm-kill-switch-for-vercel-edge-using-atomic-redis-3njm</link>
      <guid>https://dev.to/thsky21/how-i-built-a-real-time-llm-kill-switch-for-vercel-edge-using-atomic-redis-3njm</guid>
      <description>&lt;p&gt;Last week, the Axios supply chain attack compromised over 100 million weekly downloads. A week before that, it was LiteLLM.&lt;/p&gt;

&lt;p&gt;In both cases, the goal was simple: Exfiltrate API keys. As developers, we are taught to rotate our keys immediately. But there’s a massive gap in that advice. If an attacker gets your OpenAI key at 2 AM, they don't wait for you to wake up. They loop your endpoints, drain your credits, and leave you with a $1,000+ bill by sunrise.&lt;/p&gt;

&lt;p&gt;This is what OWASP calls LLM10:2025 – Unbounded Consumption (or "Denial of Wallet"). I spent the last two weeks building a way to stop it at the Edge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem: Why Rate Limiting Fails LLMs&lt;/strong&gt;&lt;br&gt;
Standard rate-limiting (e.g., 10 requests per minute) is useless for LLMs.&lt;/p&gt;

&lt;p&gt;Request 1: "Hi" (10 tokens) — Cost: $0.0001&lt;/p&gt;

&lt;p&gt;Request 2: "Summarize this 50-page PDF" (30,000 tokens) — Cost: $0.45&lt;/p&gt;

&lt;p&gt;An attacker doesn't need a high volume of requests to ruin you; they just need expensive requests. We need Budget Limiting, not Rate Limiting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Technical Challenge: The Stateless Race Condition&lt;/strong&gt;&lt;br&gt;
I’m building this for Next.js on Vercel Edge.&lt;/p&gt;

&lt;p&gt;Vercel Edge functions are stateless. If you try to track a user's spend in a local variable, it vanishes. If you use a standard database, the latency kills your UX.&lt;/p&gt;

&lt;p&gt;But the real "final boss" is the Race Condition.&lt;/p&gt;

&lt;p&gt;Imagine a user fires 10 concurrent requests.&lt;/p&gt;

&lt;p&gt;Instance A checks the budget: "Remaining: $0.05. Proceed."&lt;/p&gt;

&lt;p&gt;Instance B checks the budget: "Remaining: $0.05. Proceed."&lt;/p&gt;

&lt;p&gt;Both fire $1.00 requests.&lt;/p&gt;

&lt;p&gt;Result: You are now -$1.95 in the hole.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution: Atomic Lua Scripts on Redis&lt;/strong&gt;&lt;br&gt;
To solve this, I moved the logic into an Atomic Lua Script on Upstash Redis. Instead of "Check then Update" (two steps), the logic happens in one single, uninterruptible step inside the database memory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- The "Kill-Switch" Logic&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;-- user_budget_key&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;-- e.g., 1.00 USD&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;-- estimated cost&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="c1"&gt;-- BLOCK&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'INCRBYFLOAT'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;-- ALLOW&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs in ~10ms. If Instance A and B hit the script at the exact same millisecond, Redis queues them. One passes, the second fails. No race condition. No $1,000 surprises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Benchmark: A Controlled Stress Test&lt;/strong&gt;&lt;br&gt;
To quantify the risk, I ran a simulated Denial of Wallet (DWL) attack against a standard Next.js API route.&lt;/p&gt;

&lt;p&gt;The Setup:&lt;/p&gt;

&lt;p&gt;Attacker: A simple recursive script firing concurrent requests with high-token payloads (800+ tokens/request).&lt;/p&gt;

&lt;p&gt;Target: A GPT-4o endpoint.&lt;/p&gt;

&lt;p&gt;The Result (Unprotected): The script ran for 47 seconds. Total simulated cost reached $847.00 before manual intervention.&lt;/p&gt;

&lt;p&gt;The Result (Thskyshield): Using the same script, the governance layer triggered a 429 (Too Many Requests) at the 3rd call. Total spend: $0.08.&lt;/p&gt;

&lt;p&gt;Watch the Live Simulation →&lt;a href="https://www.thskyshield.com/simulator" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Two-Phase" Protocol&lt;/strong&gt;&lt;br&gt;
The hardest part was handling the fact that you don't know the exact cost of an LLM call until it's finished. I settled on a two-phase approach:&lt;/p&gt;

&lt;p&gt;Phase 1 (Pre-flight): Check the budget based on the max possible tokens. "Lock" that amount.&lt;/p&gt;

&lt;p&gt;Phase 2 (Post-flight): Once the LLM returns, reconcile the actual usage and "Refund" the difference to the user's budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
Supply chain attacks like the Axios one are the "new normal." We can't stop every key from being stolen, but we can stop a stolen key from being a business-ending event.&lt;/p&gt;

&lt;p&gt;I’ve open-sourced the SDK for this under Thskyshield. If you're building with Next.js and want to stop worrying about your OpenAI bill, it's free for founders.&lt;/p&gt;

&lt;p&gt;SDK: @thsky-21/thskyshield &lt;a href="https://www.npmjs.com/package/@thsky-21/thskyshield" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Website: thskyshield.com &lt;a href="https://www.thskyshield.com/" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love to hear how others are handling "Denial of Wallet" risks. Are you just relying on OpenAI's hard limits, or are you building your own governance layer?&lt;/p&gt;

</description>
      <category>nextjs</category>
      <category>ai</category>
      <category>security</category>
      <category>api</category>
    </item>
  </channel>
</rss>
