I almost burned ₹4,000 on Claude API overnight — so I built llm-cost-guard

#claude #llm #monitoring #showdev

I almost burned ₹4,000 on Claude API overnight — so I built llm-cost-guard
Last month I wrote what I thought was a harmless script.

Batch-process 847 product descriptions through Claude. Summarize each one. Save to a CSV. Ship it and go to bed.

The loop looked fine. Error handling was there. Retries were capped. I felt responsible.

I woke up to a Slack ping from my own logging bot — not because anything crashed, but because something succeeded way too much.

₹4,000 gone. Overnight. On a side project.

The loop hadn't infinite-looped in the traditional sense. It had expensive-looped. A retry bug on malformed responses meant some items got hit 3–4 times. A few prompts were longer than I estimated. And I had zero visibility into running spend while it was happening.

I stared at the Anthropic dashboard like it was a crime scene.

Why Anthropic billing alerts don't cut it
Anthropic does have billing alerts. They're useful — for finance, eventually.

But they're not a runtime guardrail:

Delayed — you find out after the damage, not mid-request
Account-level — one rogue script takes down your whole API budget
Non-blocking — an email doesn't stop a loop that's already running
What I actually needed was something that sits inside my code and says: "Stop. You've hit your limit. Right now."

Not tomorrow. Not at invoice time. Before request #400 burns another ₹500.

What llm-cost-guard does
It's a drop-in wrapper for your existing LLM client. One line. No SDK rewrite.

import Anthropic from "@anthropic-ai/sdk";
import { guard } from "@advik1228/llm-cost-guard";
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const client = guard(anthropic, { dailyLimit: 5, onLimit: "throw" });
// Use client exactly like before — same API, same methods
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Summarize this..." }],
});
That's it. If today's spend crosses $5, the next call throws. The loop dies. Your wallet survives.

You can also set monthly caps, per-request limits, per-user budgets, webhook alerts, and streaming support — but the core idea is dead simple: wrap, limit, block.

How it works under the hood
No monkey-patching. No forked SDK.

llm-cost-guard uses a JavaScript Proxy to intercept calls to messages.create (Anthropic), chat.completions.create (OpenAI), and Gemini's generateContent.

When a call completes, it reads the real token counts from the API response — usage.input_tokens, usage.output_tokens, etc. Not tiktoken guesses. The provider already counted them; we just listen.

Then it:

Calculates cost in USD from a built-in pricing table
Increments daily/monthly/user spend in memory or Redis
Checks your limits
Throws, warns, or stays silent — your call
For streaming, it runs a pre-flight estimate before the stream starts, passes every chunk through unchanged, and records spend when the stream finishes.

The Proxy pattern means your existing code doesn't change. Your types mostly don't change. You just wrap once at startup.

Install + quick start
npm install @advik1228/llm-cost-guard
Anthropic:

import { guard } from "@advik1228/llm-cost-guard";
const client = guard(anthropic, {
dailyLimit: 5.0,
warnAt: 4.0,
onLimit: "throw",
});
OpenAI:

const client = guard(openai, {
dailyLimit: 10.0,
perRequestLimit: 0.50,
onLimit: "throw",
});
Multi-tenant / production — plug in Redis so limits are shared across instances:

import { guard, RedisAdapter } from "@advik1228/llm-cost-guard";
const client = guard(anthropic, {
dailyLimit: 100,
storage: new RedisAdapter(redis),
userId: req.user.id,
userDailyLimit: 2.0,
});
Try it — before your next overnight job
I built this because I needed it to exist. Not as a SaaS pitch. Not as an observability platform. Just a small guard that sits between my code and an API that charges by the token.

If you've ever run a batch job and thought "this should be fine" — it probably is, until it isn't.

Star the repo if this saves you once. Install it before your next loop. Set a daily limit low enough to hurt your ego but not your bank account.

GitHub: https://github.com/advikhingmire12-oss/llm-cost-guard
npm: https://www.npmjs.com/package/@advik1228/llm-cost-guard