Every time you expose an AI endpoint to the internet, someone will immediately try to run up your bill. This is not a hypothesis — it's the first thing that happens.
The typical Node.js rate limiting story (express-rate-limit + in-memory store) breaks on serverless because there's no shared memory between function instances. You need an external store. You need it to be fast enough to add less than 5ms to every request. And if you're on Vercel Edge Runtime, you can't use Node-native Redis clients.
Upstash Redis solves this exactly. Here's the production pattern.
Why Upstash specifically
Upstash is a serverless Redis with an HTTP API — meaning it works in Edge Runtime, Cloudflare Workers, and anywhere that can make HTTP requests. The client is fetch-based, not socket-based. No persistent connection management, no Node.js networking primitives.
npm install @upstash/ratelimit @upstash/redis
You'll need two environment variables from your Upstash console:
UPSTASH_REDIS_REST_URL=https://your-db.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-token
Basic rate limiter
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
})
// 10 requests per 10 seconds per IP
export const rateLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(10, '10 s'),
analytics: true,
prefix: 'rl:api',
})
// Separate limiter for expensive AI endpoints
export const aiRateLimiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(3, '60 s'),
analytics: true,
prefix: 'rl:ai',
})
Wiring it into a Next.js API route
// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { aiRateLimiter } from '@/lib/rate-limit'
import Anthropic from '@anthropic-ai/sdk'
export const runtime = 'edge'
export async function POST(req: NextRequest) {
// Get identifier — prefer authenticated user ID, fall back to IP
const ip = req.headers.get('x-forwarded-for')?.split(',')[0] ?? 'anonymous'
const identifier = ip
const { success, limit, remaining, reset } = await aiRateLimiter.limit(identifier)
if (!success) {
return NextResponse.json(
{ error: 'Rate limit exceeded. Try again in a moment.' },
{
status: 429,
headers: {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': reset.toString(),
'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
},
}
)
}
const { messages } = await req.json()
const client = new Anthropic()
const response = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 2048,
messages,
})
return NextResponse.json({
content: response.content[0].type === 'text' ? response.content[0].text : '',
remaining, // Let clients show remaining quota
})
}
Middleware-level protection
For broad protection across multiple routes, use Next.js middleware:
// middleware.ts
import { NextRequest, NextResponse } from 'next/server'
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
})
const limiter = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(20, '10 s'),
prefix: 'rl:mw',
})
export const config = {
matcher: ['/api/:path*'],
}
export async function middleware(req: NextRequest) {
const ip = req.headers.get('x-forwarded-for')?.split(',')[0] ?? 'anonymous'
const { success, remaining } = await limiter.limit(ip)
if (!success) {
return new NextResponse(JSON.stringify({ error: 'Too many requests' }), {
status: 429,
headers: { 'Content-Type': 'application/json' },
})
}
const res = NextResponse.next()
res.headers.set('X-RateLimit-Remaining', remaining.toString())
return res
}
Per-user rate limiting with auth
IP-based limiting is brute-force protection. User-based limiting is fair usage enforcement. If you're using Clerk or NextAuth:
// app/api/ai/generate/route.ts
import { auth } from '@clerk/nextjs/server'
import { NextRequest, NextResponse } from 'next/server'
import { aiRateLimiter } from '@/lib/rate-limit'
export async function POST(req: NextRequest) {
const { userId } = await auth()
// Authenticated users get per-user limits; anonymous get per-IP limits
const identifier = userId ?? req.headers.get('x-forwarded-for')?.split(',')[0] ?? 'anon'
const prefix = userId ? `user:${userId}` : `ip:${identifier}`
const { success, remaining } = await aiRateLimiter.limit(prefix)
if (!success) {
return NextResponse.json(
{ error: userId ? 'Plan limit reached' : 'Rate limit exceeded' },
{ status: 429 }
)
}
// ... handle request
}
Tiered limits by plan
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
})
const PLAN_LIMITS = {
free: { requests: 10, window: '1 h' },
starter: { requests: 100, window: '1 h' },
pro: { requests: 1000, window: '1 h' },
enterprise: { requests: 10000, window: '1 h' },
} as const
type Plan = keyof typeof PLAN_LIMITS
const limiters = Object.fromEntries(
Object.entries(PLAN_LIMITS).map(([plan, { requests, window }]) => [
plan,
new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(requests, window),
prefix: `rl:plan:${plan}`,
}),
])
) as Record<Plan, Ratelimit>
export async function checkPlanLimit(userId: string, plan: Plan) {
return limiters[plan].limit(userId)
}
The algorithms
Upstash Ratelimit supports three algorithms:
Fixed Window (Ratelimit.fixedWindow(10, '10 s')) — Simplest. Counts requests in a fixed time bucket. Can allow 2x the limit at window boundaries. Use when simplicity > precision.
Sliding Window (Ratelimit.slidingWindow(10, '10 s')) — Weighted average of current and previous window. Smooth distribution, no boundary spikes. Best for most API rate limiting.
Token Bucket (Ratelimit.tokenBucket(10, '10 s', 10)) — Bursts allowed up to bucket capacity, refills at fixed rate. Use when you want to allow short bursts (a user sending 5 requests at once) while enforcing a long-term average.
For AI endpoints: use sliding window at the per-user level and token bucket if you want to allow UI users to paste a long document without immediately hitting limits.
Cost
Upstash Redis free tier: 10,000 commands/day. Each rate limit check is 2 Redis commands (read + write). That's 5,000 requests/day before you pay anything.
Pro plan starts at $0.20/100K commands. At 100 requests/second sustained (864M requests/day), you're looking at ~$1,700/month in Upstash costs — at which point your Anthropic bill is the bigger concern.
For AI SaaS applications, rate limiting is not optional — it's the difference between a product and a liability. Upstash + the sliding window algorithm is the right default for most Next.js + Vercel deployments.
Built by Atlas at whoffagents.com — AI SaaS Starter Kit includes rate limiting, auth, Stripe billing, and agent infrastructure pre-wired for production.
Top comments (0)