Aon infotech

Posted on Jun 20

Rate Limiting Without User Accounts — Strategies for Anonymous APIs

#api #backend #systemdesign #javascript

Rate limiting an authenticated API is straightforward — you have a user ID, you track requests per user, you throttle when they exceed limits.

Rate limiting an anonymous API is a different problem entirely. No user ID means you're working with proxies for identity: IP address, device fingerprint, behavioral signals. Each has weaknesses. Here's how to layer them effectively, which is exactly the challenge we solved building free AI image generator no sign up.

Why Anonymous Rate Limiting Is Harder

With authenticated users:

// Simple — user ID is a stable, reliable identifier
const key = `rate_limit:${userId}`;
const requests = await redis.incr(key);
if (requests > LIMIT) throw new RateLimitError();

With anonymous users, you have no stable identifier. Everything you use as a proxy is either spoofable, unreliable, or both.

Layer 1 — IP Address (Baseline)

IP is the obvious first layer. It's available on every request and requires no client cooperation.

// middleware.js
export function middleware(request) {
  const ip = 
    request.headers.get('x-forwarded-for')?.split(',')[0].trim() 
    ?? request.headers.get('x-real-ip')
    ?? 'unknown';

  return ip;
}

The problems with IP-only rate limiting:

Shared IPs are common. Corporate offices, universities, ISPs using carrier-grade NAT — dozens or hundreds of users sharing one IP address. Blocking an IP blocks all of them.

VPNs and proxies trivially bypass IP limits. A determined user can rotate through IPs faster than you can block them.

Dynamic IPs mean genuine users get rate limited because a previous user of that IP address hit the limit.

Still use it — IP rate limiting stops the majority of automated abuse. Just don't rely on it exclusively.

Layer 2 — Token Bucket Per IP With Generous Limits

Rather than hard cutoffs, token bucket algorithms allow burst usage while preventing sustained abuse:

// lib/rateLimiter.js
class TokenBucket {
  constructor({ capacity, refillRate, refillInterval }) {
    this.capacity = capacity;           // Max tokens
    this.refillRate = refillRate;       // Tokens added per interval
    this.refillInterval = refillInterval; // Milliseconds
    this.buckets = new Map();
  }

  async consume(key, tokens = 1) {
    const now = Date.now();
    let bucket = this.buckets.get(key) ?? {
      tokens: this.capacity,
      lastRefill: now,
    };

    // Refill based on elapsed time
    const elapsed = now - bucket.lastRefill;
    const intervals = Math.floor(elapsed / this.refillInterval);
    if (intervals > 0) {
      bucket.tokens = Math.min(
        this.capacity,
        bucket.tokens + intervals * this.refillRate
      );
      bucket.lastRefill = now;
    }

    if (bucket.tokens < tokens) {
      this.buckets.set(key, bucket);
      return { allowed: false, remaining: 0 };
    }

    bucket.tokens -= tokens;
    this.buckets.set(key, bucket);
    return { allowed: true, remaining: bucket.tokens };
  }
}

// Configuration for a generation endpoint
export const generationLimiter = new TokenBucket({
  capacity: 12,          // 12 burst generations
  refillRate: 1,         // 1 token per interval
  refillInterval: 60_000, // Per minute
});

Key advantage: A user can generate 12 images quickly for rapid iteration, but can't sustain 100+ requests per hour.

Layer 3 — Browser Fingerprinting (Client-Side Signal)

A lightweight fingerprint — screen resolution, timezone, language, platform — creates a more persistent identifier than IP alone without storing anything personally identifying.

// lib/fingerprint.js (client-side)
export async function getFingerprint() {
  const components = [
    navigator.language,
    navigator.platform,
    screen.width + 'x' + screen.height,
    screen.colorDepth,
    Intl.DateTimeFormat().resolvedOptions().timeZone,
    navigator.hardwareConcurrency,
  ];

  // Hash the components
  const str = components.join('|');
  const buffer = await crypto.subtle.digest(
    'SHA-256',
    new TextEncoder().encode(str)
  );

  return Array.from(new Uint8Array(buffer))
    .map(b => b.toString(16).padStart(2, '0'))
    .join('').slice(0, 16); // 16 char fingerprint
}

Send this alongside each request and rate limit on fingerprint + IP combined:

// More stable identifier: combine IP + fingerprint
const rateLimitKey = `${ip}:${fingerprint}`;

Weaknesses: Fingerprint can be spoofed with developer tools. Changes when users switch browsers. But it meaningfully raises the effort required to bypass limits.

Layer 4 — Behavioral Signals

Automated abuse has distinctive patterns. Tracking these adds another detection layer:

// lib/behaviorAnalyzer.js
export function analyzeBehavior(requestHistory) {
  const timestamps = requestHistory.map(r => r.timestamp);

  // Bots often have unnaturally consistent intervals
  const intervals = timestamps
    .slice(1)
    .map((t, i) => t - timestamps[i]);

  const avgInterval = intervals.reduce((a, b) => a + b, 0) / intervals.length;
  const variance = intervals.reduce((sum, interval) => {
    return sum + Math.pow(interval - avgInterval, 2);
  }, 0) / intervals.length;

  // Low variance = suspiciously regular = likely bot
  const isSuspiciouslyRegular = variance < 500; // ms²

  // Too fast between requests = likely automated
  const minimumInterval = Math.min(...intervals);
  const isTooFast = minimumInterval < 800; // 800ms minimum

  return {
    suspicious: isSuspiciouslyRegular || isTooFast,
    variance,
    minimumInterval,
  };
}

Layer 5 — Graceful Degradation Over Hard Blocks

Hard IP blocks create false positives (blocking legitimate shared-IP users) and create adversarial relationships with users who hit limits accidentally.

A better approach: degrade gracefully before blocking.

// Instead of blocking, slow down first
async function handleRequest(ip, fingerprint) {
  const key = `${ip}:${fingerprint}`;
  const { allowed, remaining } = await generationLimiter.consume(key);

  if (!allowed) {
    // How many seconds until next token?
    const waitTime = calculateWaitTime(key);

    return new Response(
      JSON.stringify({
        error: 'Rate limit reached',
        retryAfter: waitTime,
        message: `Please wait ${waitTime} seconds before generating again.`
      }),
      { 
        status: 429,
        headers: { 
          'Retry-After': String(waitTime),
          'X-RateLimit-Remaining': '0',
        }
      }
    );
  }

  // Add rate limit headers to successful responses too
  return NextResponse.next({
    headers: {
      'X-RateLimit-Remaining': String(remaining),
    }
  });
}

Production Edge Cases

Handling Vercel/CDN header forwarding:

// Different headers depending on deployment environment
function getClientIp(request) {
  return (
    request.headers.get('cf-connecting-ip') ||     // Cloudflare
    request.headers.get('x-vercel-forwarded-for') || // Vercel
    request.headers.get('x-forwarded-for')?.split(',')[0].trim() ||
    request.headers.get('x-real-ip') ||
    'unknown'
  );
}

Memory leak prevention in in-process stores:

// Clean up old buckets periodically
setInterval(() => {
  const cutoff = Date.now() - 3_600_000; // 1 hour
  for (const [key, bucket] of generationLimiter.buckets) {
    if (bucket.lastRefill < cutoff) {
      generationLimiter.buckets.delete(key);
    }
  }
}, 600_000); // Every 10 minutes

Multi-instance deployments: In-memory rate limiting doesn't share state across serverless function instances. For production scale, move the rate limit state to Redis or a similar distributed store.

What This Looks Like in Practice

The layered approach — IP + fingerprint + behavioral signals — reduces automated abuse significantly without meaningfully impacting legitimate users.

The key design choice: treat false positives as more costly than missed abuse. A legitimate user who gets incorrectly blocked is a worse outcome than an abuser who gets through occasionally. Generous limits + graceful degradation + multiple signals = the right balance for most anonymous APIs.

For monitoring, track your 429 rate as a percentage of total requests. Sustained spikes indicate either abuse or limits set too aggressively — both worth investigating.

Testing Your Rate Limiting

A few practical tests before shipping:

Test 1 — Verify limits trigger correctly:

// test/rateLimiting.test.js
async function testRateLimit() {
  const results = [];

  // Fire requests rapidly
  for (let i = 0; i < 20; i++) {
    const response = await fetch('/api/generate', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ prompt: 'test' }),
    });
    results.push(response.status);
  }

  // Should see 200s followed by 429s
  const allowed = results.filter(s => s === 200).length;
  const blocked = results.filter(s => s === 429).length;

  console.log(`Allowed: ${allowed}, Blocked: ${blocked}`);
  // Expected: Allowed ~12 (bucket capacity), Blocked ~8
}

Test 2 — Verify headers are present:

const response = await fetch('/api/generate', { method: 'POST', ... });
console.log(response.headers.get('x-ratelimit-remaining'));
console.log(response.headers.get('retry-after')); // Present on 429s

Test 3 — Verify bucket refill:

// Hit the limit, wait for refill, verify access restored
await hitRateLimit();
await sleep(61_000); // Wait for 1 minute refill
const response = await fetch('/api/generate', { method: 'POST', ... });
console.log(response.status); // Should be 200 again

Summary

Anonymous rate limiting requires layers because no single signal is reliable:

IP — baseline filter, easy to bypass but catches most automated abuse
Token bucket — allows bursts, prevents sustained hammering
Fingerprint — raises bypass effort without user friction
Behavioral signals — catches bot-like patterns IP can't detect
Graceful degradation — 429 with retry-after is better than hard blocks The goal isn't to make abuse impossible — it's to make abuse more expensive than it's worth, while keeping the experience smooth for legitimate users.

DEV Community