<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Advik</title>
    <description>The latest articles on DEV Community by Advik (@advik_9a1a8f80accc0f7364f).</description>
    <link>https://dev.to/advik_9a1a8f80accc0f7364f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3982447%2F78d0d689-820e-4c3a-8e15-3c36889df7ae.png</url>
      <title>DEV Community: Advik</title>
      <link>https://dev.to/advik_9a1a8f80accc0f7364f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/advik_9a1a8f80accc0f7364f"/>
    <language>en</language>
    <item>
      <title>I almost burned ₹4,000 on Claude API overnight — so I built llm-cost-guard</title>
      <dc:creator>Advik</dc:creator>
      <pubDate>Sat, 13 Jun 2026 09:26:29 +0000</pubDate>
      <link>https://dev.to/advik_9a1a8f80accc0f7364f/i-almost-burned-4000-on-claude-api-overnight-so-i-built-llm-cost-guard-4ch1</link>
      <guid>https://dev.to/advik_9a1a8f80accc0f7364f/i-almost-burned-4000-on-claude-api-overnight-so-i-built-llm-cost-guard-4ch1</guid>
      <description>&lt;p&gt;I almost burned ₹4,000 on Claude API overnight — so I built llm-cost-guard&lt;br&gt;
Last month I wrote what I thought was a harmless script.&lt;/p&gt;

&lt;p&gt;Batch-process 847 product descriptions through Claude. Summarize each one. Save to a CSV. Ship it and go to bed.&lt;/p&gt;

&lt;p&gt;The loop looked fine. Error handling was there. Retries were capped. I felt responsible.&lt;/p&gt;

&lt;p&gt;I woke up to a Slack ping from my own logging bot — not because anything crashed, but because something succeeded way too much.&lt;/p&gt;

&lt;p&gt;₹4,000 gone. Overnight. On a side project.&lt;/p&gt;

&lt;p&gt;The loop hadn't infinite-looped in the traditional sense. It had expensive-looped. A retry bug on malformed responses meant some items got hit 3–4 times. A few prompts were longer than I estimated. And I had zero visibility into running spend while it was happening.&lt;/p&gt;

&lt;p&gt;I stared at the Anthropic dashboard like it was a crime scene.&lt;/p&gt;

&lt;p&gt;Why Anthropic billing alerts don't cut it&lt;br&gt;
Anthropic does have billing alerts. They're useful — for finance, eventually.&lt;/p&gt;

&lt;p&gt;But they're not a runtime guardrail:&lt;/p&gt;

&lt;p&gt;Delayed — you find out after the damage, not mid-request&lt;br&gt;
Account-level — one rogue script takes down your whole API budget&lt;br&gt;
Non-blocking — an email doesn't stop a loop that's already running&lt;br&gt;
What I actually needed was something that sits inside my code and says: "Stop. You've hit your limit. Right now."&lt;/p&gt;

&lt;p&gt;Not tomorrow. Not at invoice time. Before request #400 burns another ₹500.&lt;/p&gt;

&lt;p&gt;What llm-cost-guard does&lt;br&gt;
It's a drop-in wrapper for your existing LLM client. One line. No SDK rewrite.&lt;/p&gt;

&lt;p&gt;import Anthropic from "@anthropic-ai/sdk";&lt;br&gt;
import { guard } from "@advik1228/llm-cost-guard";&lt;br&gt;
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });&lt;br&gt;
const client = guard(anthropic, { dailyLimit: 5, onLimit: "throw" });&lt;br&gt;
// Use client exactly like before — same API, same methods&lt;br&gt;
const response = await client.messages.create({&lt;br&gt;
  model: "claude-sonnet-4-6",&lt;br&gt;
  max_tokens: 1024,&lt;br&gt;
  messages: [{ role: "user", content: "Summarize this..." }],&lt;br&gt;
});&lt;br&gt;
That's it. If today's spend crosses $5, the next call throws. The loop dies. Your wallet survives.&lt;/p&gt;

&lt;p&gt;You can also set monthly caps, per-request limits, per-user budgets, webhook alerts, and streaming support — but the core idea is dead simple: wrap, limit, block.&lt;/p&gt;

&lt;p&gt;How it works under the hood&lt;br&gt;
No monkey-patching. No forked SDK.&lt;/p&gt;

&lt;p&gt;llm-cost-guard uses a JavaScript Proxy to intercept calls to messages.create (Anthropic), chat.completions.create (OpenAI), and Gemini's generateContent.&lt;/p&gt;

&lt;p&gt;When a call completes, it reads the real token counts from the API response — usage.input_tokens, usage.output_tokens, etc. Not tiktoken guesses. The provider already counted them; we just listen.&lt;/p&gt;

&lt;p&gt;Then it:&lt;/p&gt;

&lt;p&gt;Calculates cost in USD from a built-in pricing table&lt;br&gt;
Increments daily/monthly/user spend in memory or Redis&lt;br&gt;
Checks your limits&lt;br&gt;
Throws, warns, or stays silent — your call&lt;br&gt;
For streaming, it runs a pre-flight estimate before the stream starts, passes every chunk through unchanged, and records spend when the stream finishes.&lt;/p&gt;

&lt;p&gt;The Proxy pattern means your existing code doesn't change. Your types mostly don't change. You just wrap once at startup.&lt;/p&gt;

&lt;p&gt;Install + quick start&lt;br&gt;
npm install @advik1228/llm-cost-guard&lt;br&gt;
Anthropic:&lt;/p&gt;

&lt;p&gt;import { guard } from "@advik1228/llm-cost-guard";&lt;br&gt;
const client = guard(anthropic, {&lt;br&gt;
  dailyLimit: 5.0,&lt;br&gt;
  warnAt: 4.0,&lt;br&gt;
  onLimit: "throw",&lt;br&gt;
});&lt;br&gt;
OpenAI:&lt;/p&gt;

&lt;p&gt;const client = guard(openai, {&lt;br&gt;
  dailyLimit: 10.0,&lt;br&gt;
  perRequestLimit: 0.50,&lt;br&gt;
  onLimit: "throw",&lt;br&gt;
});&lt;br&gt;
Multi-tenant / production — plug in Redis so limits are shared across instances:&lt;/p&gt;

&lt;p&gt;import { guard, RedisAdapter } from "@advik1228/llm-cost-guard";&lt;br&gt;
const client = guard(anthropic, {&lt;br&gt;
  dailyLimit: 100,&lt;br&gt;
  storage: new RedisAdapter(redis),&lt;br&gt;
  userId: req.user.id,&lt;br&gt;
  userDailyLimit: 2.0,&lt;br&gt;
});&lt;br&gt;
Try it — before your next overnight job&lt;br&gt;
I built this because I needed it to exist. Not as a SaaS pitch. Not as an observability platform. Just a small guard that sits between my code and an API that charges by the token.&lt;/p&gt;

&lt;p&gt;If you've ever run a batch job and thought "this should be fine" — it probably is, until it isn't.&lt;/p&gt;

&lt;p&gt;Star the repo if this saves you once. Install it before your next loop. Set a daily limit low enough to hurt your ego but not your bank account.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/advikhingmire12-oss/llm-cost-guard" rel="noopener noreferrer"&gt;https://github.com/advikhingmire12-oss/llm-cost-guard&lt;/a&gt;&lt;br&gt;
npm: &lt;a href="https://www.npmjs.com/package/@advik1228/llm-cost-guard" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/@advik1228/llm-cost-guard&lt;/a&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>typescript</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
