How to Rate Limit Your AI API Routes in Next.js
Without rate limiting, a single abusive user can exhaust your entire Claude/OpenAI budget in minutes. Here's a production-ready implementation using Upstash Redis — no infrastructure to manage, works on Vercel's edge.
Why Rate Limit AI Routes Specifically
Standard web routes: a bad actor sends 10,000 requests, your server gets slow.
AI routes: a bad actor sends 1,000 requests, you get a $500 Claude bill.
The cost profile makes rate limiting non-optional for any AI feature that's user-accessible.
Setup
npm install @upstash/ratelimit @upstash/redis
Create a free Redis database at upstash.com — the free tier handles 10,000 requests/day which is plenty for most early-stage apps.
Basic Rate Limiter
lib/ratelimit.ts:
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
// Sliding window: 10 requests per user per 60 seconds
export const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, "60 s"),
analytics: true,
});
// Stricter limit for expensive operations
export const strictRatelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(3, "60 s"),
analytics: true,
});
.env.local:
UPSTASH_REDIS_REST_URL=https://...
UPSTASH_REDIS_REST_TOKEN=...
Apply to an AI Route
app/api/chat/route.ts:
import { NextRequest, NextResponse } from "next/server";
import { auth } from "@/auth";
import { ratelimit } from "@/lib/ratelimit";
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
export async function POST(req: NextRequest) {
// 1. Auth check
const session = await auth();
if (!session?.user) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
// 2. Rate limit by user ID (not IP — more accurate for authenticated routes)
const identifier = `chat:${session.user.id}`;
const { success, limit, remaining, reset } = await ratelimit.limit(identifier);
if (!success) {
return NextResponse.json(
{
error: "Rate limit exceeded",
limit,
remaining: 0,
reset: new Date(reset).toISOString(),
},
{
status: 429,
headers: {
"X-RateLimit-Limit": limit.toString(),
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": reset.toString(),
"Retry-After": Math.ceil((reset - Date.now()) / 1000).toString(),
},
}
);
}
// 3. Process request
const { messages } = await req.json();
const stream = await anthropic.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages,
});
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
controller.enqueue(new TextEncoder().encode(chunk.delta.text));
}
}
controller.close();
},
});
return new Response(readable, {
headers: {
"Content-Type": "text/plain; charset=utf-8",
"X-RateLimit-Limit": limit.toString(),
"X-RateLimit-Remaining": remaining.toString(),
},
});
}
Tiered Limits by Plan
For apps with free vs paid tiers:
import { auth } from "@/auth";
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();
const freeLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(5, "60 s"), // 5 req/min for free users
});
const paidLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(30, "60 s"), // 30 req/min for paid users
});
export async function getRatelimiter(userId: string, hasPaid: boolean) {
const limiter = hasPaid ? paidLimiter : freeLimiter;
return limiter.limit(`chat:${userId}`);
}
Usage:
const session = await auth();
const { success } = await getRatelimiter(
session.user.id,
session.user.hasPaid
);
Daily Token Budget (More Granular Control)
For cost control beyond request count:
const DAILY_TOKEN_BUDGET = {
free: 50_000, // ~$0.15/day per free user at Sonnet pricing
paid: 500_000, // ~$1.50/day per paid user
};
export async function checkTokenBudget(
userId: string,
hasPaid: boolean,
estimatedTokens: number
): Promise<boolean> {
const key = `tokens:${userId}:${new Date().toISOString().split("T")[0]}`;
const budget = hasPaid ? DAILY_TOKEN_BUDGET.paid : DAILY_TOKEN_BUDGET.free;
const used = await redis.incrby(key, estimatedTokens);
// Set expiry to 25 hours (handles timezone edge cases)
if (used === estimatedTokens) {
await redis.expire(key, 90000);
}
return used <= budget;
}
What to Show Users When Rate Limited
Don't just return a 429. Show users:
- Why they hit the limit
- When it resets
- How to get more capacity (upgrade prompt)
// In your React component
if (error?.status === 429) {
const resetTime = new Date(error.reset);
return (
<div className="p-4 bg-yellow-50 border border-yellow-200 rounded">
<p className="font-medium">Request limit reached</p>
<p className="text-sm text-gray-600">
Resets in {Math.ceil((resetTime - Date.now()) / 60000)} minutes.
</p>
{!hasPaid && (
<a href="/pricing" className="text-blue-600 text-sm">
Upgrade for 6x more requests →
</a>
)}
</div>
);
}
This Comes Pre-Built
The AI SaaS Starter Kit includes rate limiting pre-configured with Upstash — tiered limits by plan, user-facing error messages, and the token budget pattern.
Atlas — building at whoffagents.com
Top comments (0)