Rate limiting an authenticated API is straightforward — you have a user ID, you track requests per user, you throttle when they exceed limits.
Rate limiting an anonymous API is a different problem entirely. No user ID means you're working with proxies for identity: IP address, device fingerprint, behavioral signals. Each has weaknesses. Here's how to layer them effectively, which is exactly the challenge we solved building free AI image generator no sign up.
Why Anonymous Rate Limiting Is Harder
With authenticated users:
// Simple — user ID is a stable, reliable identifier
const key = `rate_limit:${userId}`;
const requests = await redis.incr(key);
if (requests > LIMIT) throw new RateLimitError();
With anonymous users, you have no stable identifier. Everything you use as a proxy is either spoofable, unreliable, or both.
Layer 1 — IP Address (Baseline)
IP is the obvious first layer. It's available on every request and requires no client cooperation.
// middleware.js
export function middleware(request) {
const ip =
request.headers.get('x-forwarded-for')?.split(',')[0].trim()
?? request.headers.get('x-real-ip')
?? 'unknown';
return ip;
}
The problems with IP-only rate limiting:
Shared IPs are common. Corporate offices, universities, ISPs using carrier-grade NAT — dozens or hundreds of users sharing one IP address. Blocking an IP blocks all of them.
VPNs and proxies trivially bypass IP limits. A determined user can rotate through IPs faster than you can block them.
Dynamic IPs mean genuine users get rate limited because a previous user of that IP address hit the limit.
Still use it — IP rate limiting stops the majority of automated abuse. Just don't rely on it exclusively.
Layer 2 — Token Bucket Per IP With Generous Limits
Rather than hard cutoffs, token bucket algorithms allow burst usage while preventing sustained abuse:
// lib/rateLimiter.js
class TokenBucket {
constructor({ capacity, refillRate, refillInterval }) {
this.capacity = capacity; // Max tokens
this.refillRate = refillRate; // Tokens added per interval
this.refillInterval = refillInterval; // Milliseconds
this.buckets = new Map();
}
async consume(key, tokens = 1) {
const now = Date.now();
let bucket = this.buckets.get(key) ?? {
tokens: this.capacity,
lastRefill: now,
};
// Refill based on elapsed time
const elapsed = now - bucket.lastRefill;
const intervals = Math.floor(elapsed / this.refillInterval);
if (intervals > 0) {
bucket.tokens = Math.min(
this.capacity,
bucket.tokens + intervals * this.refillRate
);
bucket.lastRefill = now;
}
if (bucket.tokens < tokens) {
this.buckets.set(key, bucket);
return { allowed: false, remaining: 0 };
}
bucket.tokens -= tokens;
this.buckets.set(key, bucket);
return { allowed: true, remaining: bucket.tokens };
}
}
// Configuration for a generation endpoint
export const generationLimiter = new TokenBucket({
capacity: 12, // 12 burst generations
refillRate: 1, // 1 token per interval
refillInterval: 60_000, // Per minute
});
Key advantage: A user can generate 12 images quickly for rapid iteration, but can't sustain 100+ requests per hour.
Layer 3 — Browser Fingerprinting (Client-Side Signal)
A lightweight fingerprint — screen resolution, timezone, language, platform — creates a more persistent identifier than IP alone without storing anything personally identifying.
// lib/fingerprint.js (client-side)
export async function getFingerprint() {
const components = [
navigator.language,
navigator.platform,
screen.width + 'x' + screen.height,
screen.colorDepth,
Intl.DateTimeFormat().resolvedOptions().timeZone,
navigator.hardwareConcurrency,
];
// Hash the components
const str = components.join('|');
const buffer = await crypto.subtle.digest(
'SHA-256',
new TextEncoder().encode(str)
);
return Array.from(new Uint8Array(buffer))
.map(b => b.toString(16).padStart(2, '0'))
.join('').slice(0, 16); // 16 char fingerprint
}
Send this alongside each request and rate limit on fingerprint + IP combined:
// More stable identifier: combine IP + fingerprint
const rateLimitKey = `${ip}:${fingerprint}`;
Weaknesses: Fingerprint can be spoofed with developer tools. Changes when users switch browsers. But it meaningfully raises the effort required to bypass limits.
Layer 4 — Behavioral Signals
Automated abuse has distinctive patterns. Tracking these adds another detection layer:
// lib/behaviorAnalyzer.js
export function analyzeBehavior(requestHistory) {
const timestamps = requestHistory.map(r => r.timestamp);
// Bots often have unnaturally consistent intervals
const intervals = timestamps
.slice(1)
.map((t, i) => t - timestamps[i]);
const avgInterval = intervals.reduce((a, b) => a + b, 0) / intervals.length;
const variance = intervals.reduce((sum, interval) => {
return sum + Math.pow(interval - avgInterval, 2);
}, 0) / intervals.length;
// Low variance = suspiciously regular = likely bot
const isSuspiciouslyRegular = variance < 500; // ms²
// Too fast between requests = likely automated
const minimumInterval = Math.min(...intervals);
const isTooFast = minimumInterval < 800; // 800ms minimum
return {
suspicious: isSuspiciouslyRegular || isTooFast,
variance,
minimumInterval,
};
}
Layer 5 — Graceful Degradation Over Hard Blocks
Hard IP blocks create false positives (blocking legitimate shared-IP users) and create adversarial relationships with users who hit limits accidentally.
A better approach: degrade gracefully before blocking.
// Instead of blocking, slow down first
async function handleRequest(ip, fingerprint) {
const key = `${ip}:${fingerprint}`;
const { allowed, remaining } = await generationLimiter.consume(key);
if (!allowed) {
// How many seconds until next token?
const waitTime = calculateWaitTime(key);
return new Response(
JSON.stringify({
error: 'Rate limit reached',
retryAfter: waitTime,
message: `Please wait ${waitTime} seconds before generating again.`
}),
{
status: 429,
headers: {
'Retry-After': String(waitTime),
'X-RateLimit-Remaining': '0',
}
}
);
}
// Add rate limit headers to successful responses too
return NextResponse.next({
headers: {
'X-RateLimit-Remaining': String(remaining),
}
});
}
Production Edge Cases
Handling Vercel/CDN header forwarding:
// Different headers depending on deployment environment
function getClientIp(request) {
return (
request.headers.get('cf-connecting-ip') || // Cloudflare
request.headers.get('x-vercel-forwarded-for') || // Vercel
request.headers.get('x-forwarded-for')?.split(',')[0].trim() ||
request.headers.get('x-real-ip') ||
'unknown'
);
}
Memory leak prevention in in-process stores:
// Clean up old buckets periodically
setInterval(() => {
const cutoff = Date.now() - 3_600_000; // 1 hour
for (const [key, bucket] of generationLimiter.buckets) {
if (bucket.lastRefill < cutoff) {
generationLimiter.buckets.delete(key);
}
}
}, 600_000); // Every 10 minutes
Multi-instance deployments: In-memory rate limiting doesn't share state across serverless function instances. For production scale, move the rate limit state to Redis or a similar distributed store.
What This Looks Like in Practice
The layered approach — IP + fingerprint + behavioral signals — reduces automated abuse significantly without meaningfully impacting legitimate users.
The key design choice: treat false positives as more costly than missed abuse. A legitimate user who gets incorrectly blocked is a worse outcome than an abuser who gets through occasionally. Generous limits + graceful degradation + multiple signals = the right balance for most anonymous APIs.
For monitoring, track your 429 rate as a percentage of total requests. Sustained spikes indicate either abuse or limits set too aggressively — both worth investigating.
Testing Your Rate Limiting
A few practical tests before shipping:
Test 1 — Verify limits trigger correctly:
// test/rateLimiting.test.js
async function testRateLimit() {
const results = [];
// Fire requests rapidly
for (let i = 0; i < 20; i++) {
const response = await fetch('/api/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt: 'test' }),
});
results.push(response.status);
}
// Should see 200s followed by 429s
const allowed = results.filter(s => s === 200).length;
const blocked = results.filter(s => s === 429).length;
console.log(`Allowed: ${allowed}, Blocked: ${blocked}`);
// Expected: Allowed ~12 (bucket capacity), Blocked ~8
}
Test 2 — Verify headers are present:
const response = await fetch('/api/generate', { method: 'POST', ... });
console.log(response.headers.get('x-ratelimit-remaining'));
console.log(response.headers.get('retry-after')); // Present on 429s
Test 3 — Verify bucket refill:
// Hit the limit, wait for refill, verify access restored
await hitRateLimit();
await sleep(61_000); // Wait for 1 minute refill
const response = await fetch('/api/generate', { method: 'POST', ... });
console.log(response.status); // Should be 200 again
Summary
Anonymous rate limiting requires layers because no single signal is reliable:
- IP — baseline filter, easy to bypass but catches most automated abuse
- Token bucket — allows bursts, prevents sustained hammering
- Fingerprint — raises bypass effort without user friction
- Behavioral signals — catches bot-like patterns IP can't detect
- Graceful degradation — 429 with retry-after is better than hard blocks The goal isn't to make abuse impossible — it's to make abuse more expensive than it's worth, while keeping the experience smooth for legitimate users.
Top comments (0)