matt-dean-git

Posted on Apr 7 • Originally published at satgate.io

How to Add Budget Limits to OpenAI API Calls

#openai #api #tutorial #webdev

OpenAI's dashboard shows you costs after they happen. By then, it's too late. Learn how to enforce hard budget limits that block requests before they overspend.

The $72,000 Lesson

Last month, a developer shared their nightmare: a misconfigured retry loop burned $72,000 in OpenAI credits overnight. The dashboard showed the damage hours later. The bill? Non-negotiable.

This isn't rare. Search "OpenAI unexpected bill" and you'll find dozens of similar stories. The pattern is always the same:

A bug causes excessive API calls
Rate limits prevent immediate detection
Usage dashboards update hours later
The damage is already done

OpenAI's built-in limits? They're monthly caps that email you after overspending. That's like a smoke detector that texts you after your house burns down.

Why Traditional Solutions Fail

Most teams try one of three approaches:

1. OpenAI's Usage Limits

OpenAI offers monthly spending limits, but they have critical flaws:

Delayed enforcement: Limits check against cached usage data
All-or-nothing: Hit the limit? Your entire account stops
No granularity: Can't set limits per team, project, or user
Soft enforcement: "Hard limits" can still overshoot by 10-20%

2. Monitoring Dashboards

Tools like Datadog or custom dashboards show beautiful graphs of your spending. They're great for post-mortems, useless for prevention:

# This alert fires AFTER you've already spent $1000
alert: openai_daily_spend_high
expr: sum(openai_spend_24h) > 1000
annotations:
  summary: "OpenAI spend exceeded $1000 in 24h"

3. Client-Side Rate Limiting

Some teams implement token counting in their application code:

import tiktoken

class OpenAIBudgetWrapper:
    def __init__(self, daily_limit=100):
        self.daily_limit = daily_limit
        self.spent_today = 0

    def complete(self, prompt):
        # Problem 1: Estimates are often wrong
        estimated_cost = self.estimate_cost(prompt)

        # Problem 2: No coordination between instances
        if self.spent_today + estimated_cost > self.daily_limit:
            raise BudgetExceeded()

        # Problem 3: Actual cost known only after response
        response = openai.complete(prompt)
        actual_cost = response.usage.total_cost
        self.spent_today += actual_cost

        return response

This fails because:

Cost estimates are inaccurate (especially with JSON mode, tool calls)
Multiple app instances don't share state
Actual costs are known only after the request completes
No protection against retry storms or runaway loops

The Solution: Request-Level Budget Enforcement

Real budget protection requires three things OpenAI doesn't provide:

Pre-request validation: Check budgets before forwarding to OpenAI
Real-time accounting: Track actual spend, not estimates
Granular controls: Different limits for different use cases

Here's how to implement it properly with SatGate:

Step 1: Install the Gateway

# Install SatGate
npm install -g @satgate/gateway

# Start with OpenAI proxy
satgate start --proxy openai

Step 2: Create Budget-Limited Tokens

Instead of using your OpenAI API key directly, create derivative tokens with spending limits:

# Development token: $10/day for testing
satgate token create \
  --name "dev-token" \
  --daily-limit 10 \
  --upstream openai

# Production token: $100/day with alerts at 80%
satgate token create \
  --name "prod-token" \
  --daily-limit 100 \
  --alert-threshold 0.8 \
  --upstream openai

# High-priority token: $500/day for critical paths
satgate token create \
  --name "priority-token" \
  --daily-limit 500 \
  --hourly-limit 50 \
  --upstream openai

Step 3: Update Your Code

The beautiful part? Your application code barely changes:

import OpenAI from 'openai';

// Before: Direct OpenAI connection
// const openai = new OpenAI({
//   apiKey: process.env.OPENAI_API_KEY
// });

// After: Route through SatGate
const openai = new OpenAI({
  apiKey: process.env.SATGATE_TOKEN,  // Your budget-limited token
  baseURL: 'http://localhost:8000/v1' // SatGate proxy
});

// Everything else stays the same
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [{ role: "user", content: "Hello" }]
});

Step 4: Configure Team Budgets

For larger teams, create hierarchical budgets:

# Create team buckets
satgate budget create --name "engineering" --monthly 5000
satgate budget create --name "marketing" --monthly 2000
satgate budget create --name "support" --monthly 1000

# Create tokens within team budgets
satgate token create \
  --name "eng-dev" \
  --budget "engineering" \
  --daily-limit 50

satgate token create \
  --name "marketing-automation" \
  --budget "marketing" \
  --daily-limit 100 \
  --model "gpt-3.5-turbo" # Restrict to cheaper models

Real-World Example: Preventing Retry Storms

Here's how SatGate prevents the $72,000 nightmare scenario:

// Buggy code with infinite retry loop
async function processDocument(doc) {
  while (true) {
    try {
      const response = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [
          { role: "system", content: "Extract entities from document" },
          { role: "user", content: doc.content } // Bug: 100MB document
        ]
      });
      return response;
    } catch (error) {
      console.log("Retrying..."); // Infinite loop on large docs
      await sleep(1000);
    }
  }
}

Without protection: This burns thousands of dollars as it repeatedly sends a huge document to GPT-4.

With SatGate: The token's hourly limit triggers after ~$50, blocking further requests:

# Request 1: $12.50 (huge input) - Allowed (total: $12.50)
# Request 2: $12.50 retry - Allowed (total: $25.00)
# Request 3: $12.50 retry - Allowed (total: $37.50)
# Request 4: $12.50 retry - Allowed (total: $50.00)
# Request 5: BLOCKED - Hourly limit exceeded

{
  "error": {
    "type": "budget_exceeded",
    "message": "Hourly budget limit exceeded",
    "limit": 50,
    "spent": 50,
    "resets_at": "2024-04-07T19:00:00Z"
  }
}

Advanced: Per-User Budgets for AI Apps

Building a ChatGPT wrapper? Give each user their own budget:

// Middleware to inject user-specific tokens
app.use(async (req, res, next) => {
  const userId = req.user.id;

  // Get or create user token
  let token = await cache.get(`token:${userId}`);
  if (!token) {
    token = await satgate.tokens.create({
      name: `user-${userId}`,
      daily_limit: 10,  // $10/day per user
      upstream: 'openai'
    });
    await cache.set(`token:${userId}`, token, 86400);
  }

  // Inject token for OpenAI client
  req.openaiToken = token;
  next();
});

// Route handler uses user-specific token
app.post('/chat', async (req, res) => {
  const openai = new OpenAI({
    apiKey: req.openaiToken,
    baseURL: 'http://localhost:8000/v1'
  });

  try {
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: req.body.messages
    });
    res.json(response);
  } catch (error) {
    if (error.type === 'budget_exceeded') {
      res.status(429).json({
        error: "Daily limit reached. Upgrade for more credits."
      });
    }
  }
});

Monitoring and Alerts

Unlike OpenAI's "email after overspend" approach, SatGate alerts you before problems:

# Configure alerts
satgate alerts add \
  --type webhook \
  --url https://your-app.com/webhooks/budget-alerts \
  --events "budget.80_percent,budget.exceeded,anomaly.detected"

# Alert payload when 80% spent
{
  "event": "budget.80_percent",
  "token": "prod-token",
  "spent": 80.00,
  "limit": 100.00,
  "period": "daily",
  "top_consumers": [
    { "endpoint": "/api/chat", "spent": 45.00 },
    { "endpoint": "/api/summarize", "spent": 35.00 }
  ]
}

The Results

Teams using request-level budget enforcement report:

100% prevention of runaway spend incidents
73% reduction in overall OpenAI costs (better visibility)
Zero production outages from hitting OpenAI account limits
Granular insights into cost per feature/team/user

Common Questions

Does this add latency?

SatGate adds <1ms to check budgets. Compare that to the 2-3 seconds for a typical GPT-4 call. The overhead is negligible.

What happens when limits are hit?

Requests are immediately rejected with a 429 status and clear error message. Your app can handle this gracefully - offer upgrades, queue for later, or fall back to cached responses.

Can I override limits in emergencies?

Yes. Create emergency tokens with higher limits or use temporary overrides:

# Temporary override for incident response
satgate token update incident-token --daily-limit 1000 --expires 1h

Start Small, Scale Safely

You don't need to migrate everything at once. Start with:

Install SatGate alongside your existing setup
Route development traffic through budget-limited tokens
Monitor savings and prevented overages
Gradually migrate production workloads

The best time to add budget protection? Before you need it. The second best time? Right now.

Ready to protect your OpenAI spending? SatGate is open source and takes 5 minutes to set up. Get started on GitHub or read the docs.

DEV Community