DEV Community

Ali Raza
Ali Raza

Posted on

How to Stop Your OpenAI API Bill from Spiraling Out of Control

How to Stop Your OpenAI API Bill from Spiraling Out of Control

If you're building with LLM APIs (OpenAI, Anthropic Claude, Google Gemini), you've probably had that moment — you check your dashboard and realize a runaway loop or an uncapped user session just burned through your entire monthly budget in minutes.

There's no built-in way to set spending limits across these SDKs. OpenAI's usage limits are post-hoc (money already spent), Anthropic has no budget controls, and Gemini has rate limits but not token budgets.

I built llm-spend-guard to fix this.

What It Does

It wraps your existing LLM SDK calls and enforces token budgets before any request is sent. If a request would exceed your budget, it gets blocked instantly — the API is never called, no money wasted.

Your Code --> llm-spend-guard --> LLM API
                  |
                  ├── Estimates tokens BEFORE the request
                  ├── Checks budget (global, per-user, per-session)
                  ├── Over budget? BLOCKS the request
                  └── Under budget? Sends request, tracks usage
Enter fullscreen mode Exit fullscreen mode

Quick Start (3 Lines)

import { LLMGuard } from 'llm-spend-guard';
import OpenAI from 'openai';

const guard = new LLMGuard({ dailyBudgetTokens: 100_000 });

const openai = new OpenAI();
guard.wrapOpenAI(openai);

// Use guard.openai instead of openai
const response = await guard.openai.chat({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello!' }],
  max_tokens: 500,
});
Enter fullscreen mode Exit fullscreen mode

If this request would exceed 100K tokens for the day, it throws BudgetExceededError and OpenAI is never called.

Works with All Major Providers

guard.wrapOpenAI(openai);       // GPT-4o, GPT-4, etc.
guard.wrapAnthropic(anthropic); // Claude
guard.wrapGemini(gemini);       // Gemini 1.5 Pro
Enter fullscreen mode Exit fullscreen mode

All three share the same budget pool or can have separate limits.

Per-User Budgets (For SaaS Apps)

This is where it gets powerful. If you're building a multi-tenant app:

const guard = new LLMGuard({
  dailyBudgetTokens: 1_000_000,  // 1M total across all users
  userBudgetTokens: 10_000,      // 10K per user per day
  storage: new RedisStorage(new Redis()),
});

await guard.openai.chat(
  { model: 'gpt-4o', messages, max_tokens: 500 },
  { userId: req.user.id }  // per-user tracking
);
Enter fullscreen mode Exit fullscreen mode

Express.js Middleware

import { expressMiddleware, budgetErrorHandler } from 'llm-spend-guard';

app.use(expressMiddleware(guard));  // auto-extracts userId, sessionId
app.use(budgetErrorHandler);       // returns 429 when budget exceeded
Enter fullscreen mode Exit fullscreen mode

Key Features

  • Pre-request blocking — No money spent on over-budget requests
  • Multi-provider — OpenAI + Anthropic Claude + Google Gemini
  • Multi-scope — Global, per-user, per-session, per-route budgets
  • Auto-truncation — Intelligently trims prompts to fit remaining budget
  • Redis support — Production-ready with shared state across instances
  • Express/Next.js — Built-in middleware
  • TypeScript-first — Full type safety
  • 18.6KB — Lightweight, single dependency (tiktoken)

Links


If you've been bitten by surprise LLM bills, give it a try and let me know what you think!

Top comments (0)