How to build a Claude AI rate limiter that saves you from surprise bills
If you're running a Claude-powered app and paying per token, you already know the anxiety: one unexpected traffic spike and your billing dashboard looks like a ransom note.
This tutorial shows you how to build a simple rate limiter in Node.js that caps your Claude API spend — regardless of how many users hit your app.
The problem
Per-token pricing means your costs scale with usage. Great for Anthropic. Potentially terrifying for you.
User sends 1,000 messages → you pay for 1,000 messages
User sends 100,000 messages → you pay for 100,000 messages
No cap. No ceiling. Just an invoice.
The solution: a token bucket rate limiter
A token bucket gives each user a fixed allowance per time window. When they hit the limit, they wait — you don't pay.
// rate-limiter.js
const Anthropic = require('@anthropic-ai/sdk');
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
// In-memory store (use Redis in production)
const buckets = new Map();
const RATE_LIMIT = {
maxRequests: 10, // requests per window
windowMs: 60 * 1000, // 1 minute
maxTokensPerReq: 1000 // cap tokens per request
};
function checkRateLimit(userId) {
const now = Date.now();
const bucket = buckets.get(userId) || { count: 0, resetAt: now + RATE_LIMIT.windowMs };
// Reset window if expired
if (now > bucket.resetAt) {
bucket.count = 0;
bucket.resetAt = now + RATE_LIMIT.windowMs;
}
if (bucket.count >= RATE_LIMIT.maxRequests) {
const waitSeconds = Math.ceil((bucket.resetAt - now) / 1000);
return { allowed: false, waitSeconds };
}
bucket.count++;
buckets.set(userId, bucket);
return { allowed: true, remaining: RATE_LIMIT.maxRequests - bucket.count };
}
async function askClaude(userId, userMessage) {
const limit = checkRateLimit(userId);
if (!limit.allowed) {
return {
error: true,
message: `Rate limit hit. Try again in ${limit.waitSeconds}s.`
};
}
// Truncate long messages to cap token spend
const truncated = userMessage.slice(0, RATE_LIMIT.maxTokensPerReq * 4); // ~4 chars/token
const response = await client.messages.create({
model: 'claude-opus-4-5',
max_tokens: 500, // hard cap on output
messages: [{ role: 'user', content: truncated }]
});
return {
error: false,
text: response.content[0].text,
remaining: limit.remaining
};
}
// Example usage
(async () => {
const userId = 'user_123';
for (let i = 0; i < 12; i++) {
const result = await askClaude(userId, `Question number ${i + 1}: what is 2+2?`);
if (result.error) {
console.log(`Request ${i + 1}: BLOCKED — ${result.message}`);
} else {
console.log(`Request ${i + 1}: OK — ${result.remaining} remaining`);
}
}
})();
Run it:
npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=your_key node rate-limiter.js
Output:
Request 1: OK — 9 remaining
Request 2: OK — 8 remaining
...
Request 10: OK — 0 remaining
Request 11: BLOCKED — Rate limit hit. Try again in 47s.
Request 12: BLOCKED — Rate limit hit. Try again in 46s.
Add it to an Express server
// server.js
const express = require('express');
const { checkRateLimit } = require('./rate-limiter');
const app = express();
app.use(express.json());
app.post('/chat', async (req, res) => {
const { userId, message } = req.body;
if (!userId || !message) {
return res.status(400).json({ error: 'userId and message required' });
}
const result = await askClaude(userId, message);
if (result.error) {
return res.status(429).json(result);
}
res.json(result);
});
app.listen(3000, () => console.log('Server on port 3000'));
Production upgrades
For real apps, swap the in-memory Map for Redis:
// redis-rate-limiter.js
const { createClient } = require('redis');
const redis = createClient();
await redis.connect();
async function checkRateLimitRedis(userId) {
const key = `rate:${userId}`;
const count = await redis.incr(key);
if (count === 1) {
// First request in window — set expiry
await redis.expire(key, 60);
}
if (count > 10) {
const ttl = await redis.ttl(key);
return { allowed: false, waitSeconds: ttl };
}
return { allowed: true, remaining: 10 - count };
}
This survives server restarts and works across multiple instances.
The alternative: just use flat-rate pricing
Everything above exists because of per-token anxiety.
If your use case is personal productivity, side projects, or low-traffic apps, there's a simpler option: pay a flat monthly rate and stop thinking about it.
SimplyLouie gives you full Claude API access for $2/month — no token counting, no surprise bills, no rate limiting you need to build yourself. You get a clean HTTP API:
curl -X POST https://simplylouie.com/api/chat \
-H 'Authorization: Bearer YOUR_KEY' \
-H 'Content-Type: application/json' \
-d '{"message": "hello world"}'
For comparison: ChatGPT API at $20/month + per-token charges. SimplyLouie at $2/month, flat.
If you're building something high-traffic, implement the rate limiter above — you'll need it. If you're building something for yourself, the flat-rate option saves you the engineering overhead.
Building something with Claude? What's your biggest pain point — token costs, rate limits, or latency? Let me know in the comments.
Top comments (0)