How to build a Claude AI rate limiter that saves you from surprise bills

#claude #node #ai #webdev

How to build a Claude AI rate limiter that saves you from surprise bills

If you're running a Claude-powered app and paying per token, you already know the anxiety: one unexpected traffic spike and your billing dashboard looks like a ransom note.

This tutorial shows you how to build a simple rate limiter in Node.js that caps your Claude API spend — regardless of how many users hit your app.

The problem

Per-token pricing means your costs scale with usage. Great for Anthropic. Potentially terrifying for you.

User sends 1,000 messages → you pay for 1,000 messages
User sends 100,000 messages → you pay for 100,000 messages

No cap. No ceiling. Just an invoice.

The solution: a token bucket rate limiter

A token bucket gives each user a fixed allowance per time window. When they hit the limit, they wait — you don't pay.

// rate-limiter.js
const Anthropic = require('@anthropic-ai/sdk');

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// In-memory store (use Redis in production)
const buckets = new Map();

const RATE_LIMIT = {
  maxRequests: 10,       // requests per window
  windowMs: 60 * 1000,  // 1 minute
  maxTokensPerReq: 1000 // cap tokens per request
};

function checkRateLimit(userId) {
  const now = Date.now();
  const bucket = buckets.get(userId) || { count: 0, resetAt: now + RATE_LIMIT.windowMs };

  // Reset window if expired
  if (now > bucket.resetAt) {
    bucket.count = 0;
    bucket.resetAt = now + RATE_LIMIT.windowMs;
  }

  if (bucket.count >= RATE_LIMIT.maxRequests) {
    const waitSeconds = Math.ceil((bucket.resetAt - now) / 1000);
    return { allowed: false, waitSeconds };
  }

  bucket.count++;
  buckets.set(userId, bucket);
  return { allowed: true, remaining: RATE_LIMIT.maxRequests - bucket.count };
}

async function askClaude(userId, userMessage) {
  const limit = checkRateLimit(userId);

  if (!limit.allowed) {
    return {
      error: true,
      message: `Rate limit hit. Try again in ${limit.waitSeconds}s.`
    };
  }

  // Truncate long messages to cap token spend
  const truncated = userMessage.slice(0, RATE_LIMIT.maxTokensPerReq * 4); // ~4 chars/token

  const response = await client.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 500,  // hard cap on output
    messages: [{ role: 'user', content: truncated }]
  });

  return {
    error: false,
    text: response.content[0].text,
    remaining: limit.remaining
  };
}

// Example usage
(async () => {
  const userId = 'user_123';

  for (let i = 0; i < 12; i++) {
    const result = await askClaude(userId, `Question number ${i + 1}: what is 2+2?`);
    if (result.error) {
      console.log(`Request ${i + 1}: BLOCKED — ${result.message}`);
    } else {
      console.log(`Request ${i + 1}: OK — ${result.remaining} remaining`);
    }
  }
})();

Run it:

npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=your_key node rate-limiter.js

Output:

Request 1: OK — 9 remaining
Request 2: OK — 8 remaining
...
Request 10: OK — 0 remaining
Request 11: BLOCKED — Rate limit hit. Try again in 47s.
Request 12: BLOCKED — Rate limit hit. Try again in 46s.

Add it to an Express server

// server.js
const express = require('express');
const { checkRateLimit } = require('./rate-limiter');
const app = express();
app.use(express.json());

app.post('/chat', async (req, res) => {
  const { userId, message } = req.body;

  if (!userId || !message) {
    return res.status(400).json({ error: 'userId and message required' });
  }

  const result = await askClaude(userId, message);

  if (result.error) {
    return res.status(429).json(result);
  }

  res.json(result);
});

app.listen(3000, () => console.log('Server on port 3000'));

Production upgrades

For real apps, swap the in-memory Map for Redis:

// redis-rate-limiter.js
const { createClient } = require('redis');
const redis = createClient();
await redis.connect();

async function checkRateLimitRedis(userId) {
  const key = `rate:${userId}`;
  const count = await redis.incr(key);

  if (count === 1) {
    // First request in window — set expiry
    await redis.expire(key, 60);
  }

  if (count > 10) {
    const ttl = await redis.ttl(key);
    return { allowed: false, waitSeconds: ttl };
  }

  return { allowed: true, remaining: 10 - count };
}

This survives server restarts and works across multiple instances.

The alternative: just use flat-rate pricing

Everything above exists because of per-token anxiety.

If your use case is personal productivity, side projects, or low-traffic apps, there's a simpler option: pay a flat monthly rate and stop thinking about it.

SimplyLouie gives you full Claude API access for $2/month — no token counting, no surprise bills, no rate limiting you need to build yourself. You get a clean HTTP API:

curl -X POST https://simplylouie.com/api/chat \
  -H 'Authorization: Bearer YOUR_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"message": "hello world"}'

For comparison: ChatGPT API at $20/month + per-token charges. SimplyLouie at $2/month, flat.

If you're building something high-traffic, implement the rate limiter above — you'll need it. If you're building something for yourself, the flat-rate option saves you the engineering overhead.

Building something with Claude? What's your biggest pain point — token costs, rate limits, or latency? Let me know in the comments.