DEV Community

Ali
Ali

Posted on

Rate Limiting in Node.js: The Complete Guide for Production APIs (2026)

TL;DR: Rate limiting protects your API from abuse and ensures fair usage. This guide covers algorithms, implementation patterns, distributed systems, and common pitfalls—with practical code examples.


Every production API needs rate limiting. Without it, a single bad actor (or a bug in a client) can take down your entire service. This guide covers everything you need to implement rate limiting correctly in Node.js.

Why Rate Limiting Matters

Rate limiting serves three purposes:

1. Protection Against Abuse

Bots, scrapers, and attackers can flood your API with requests. Rate limiting caps how much damage they can do.

2. Fair Resource Allocation

Without limits, one aggressive client can starve others. Rate limiting ensures everyone gets a fair share.

3. API Monetization

SaaS products use tiered rate limits to differentiate pricing tiers. Free users get 100 req/hour, paid users get 10,000.

Real Examples of What Goes Wrong

  • Twitter (2013): API abuse caused widespread outages. They introduced aggressive rate limiting.
  • GitHub: Rate limits all API calls. Unauthenticated: 60 req/hour. Authenticated: 5,000 req/hour.
  • Stripe: 100 read operations/sec, 25 write operations/sec per API key.

If these companies need rate limiting, so do you.

Rate Limiting Algorithms Explained

Three main algorithms power rate limiters. Each has tradeoffs.

Fixed Window

Divide time into fixed slots (0:00-1:00, 1:00-2:00). Count requests per slot. Reset count at slot boundary.

Window 1 (0:00-1:00): ████████░░ 80/100 ✓
Window 2 (1:00-2:00): ██████████ 100/100 ✓
Window 3 (2:00-3:00): ███░░░░░░░ 30/100 ✓
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Simple to implement
  • O(1) time and space complexity
  • Easy to understand and debug

Cons:

  • Burst at boundaries (100 at 0:59 + 100 at 1:01 = 200 in 2 seconds)

Best for: Most general-purpose APIs where exact precision isn't critical.

Sliding Window Log

Store the timestamp of every request. Count requests in the last N seconds.

Requests: [0:15, 0:22, 0:45, 0:58, 1:03, 1:15]
Window (last 60s at 1:20): [0:22, 0:45, 0:58, 1:03, 1:15] = 5 requests
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Precise rate limiting
  • No burst at boundaries
  • Smooth traffic shaping

Cons:

  • Memory grows with traffic (store every timestamp)
  • O(n) lookups
  • Not practical for high-traffic APIs

Best for: Low-traffic, precision-critical scenarios (billing, security).

Sliding Window Counter

Hybrid approach. Use weighted average of current and previous window counts.

Previous window (0:00-1:00): 80 requests
Current window (1:00-2:00): 30 requests so far
Time into current window: 20 seconds (33%)

Estimated count = (80 * 0.67) + 30 = 83.6 requests
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Smooths the burst problem
  • O(1) operations
  • Good accuracy without storing every timestamp

Cons:

  • Approximation, not exact
  • Slightly more complex to implement

Best for: High-traffic APIs needing smoother limiting than fixed window.

Token Bucket

Tokens added at fixed rate. Each request consumes one token. Requests denied when bucket empty.

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

Time 0: 100 tokens
Burst of 100 requests: 0 tokens
After 5 seconds: 50 tokens (refilled)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Allows controlled bursts
  • Smooth rate limiting
  • Flexible configuration

Cons:

  • More complex to implement
  • Requires tracking token count and last refill time

Best for: APIs that want to allow short bursts while limiting sustained traffic.

Decision Tree

Need precision? → Sliding Window Log
High traffic + allow bursts? → Token Bucket
High traffic + simple? → Fixed Window
High traffic + smooth? → Sliding Window Counter
Enter fullscreen mode Exit fullscreen mode

Most APIs? Start with Fixed Window. It's simple, fast, and good enough.

Implementing Rate Limiting in Node.js

The Naive Approach (Don't Do This)

// Don't do this in production
const requests = {}

app.use((req, res, next) => {
  const ip = req.ip
  const now = Date.now()

  if (!requests[ip]) {
    requests[ip] = { count: 1, startTime: now }
  } else if (now - requests[ip].startTime > 60000) {
    requests[ip] = { count: 1, startTime: now }
  } else {
    requests[ip].count++
  }

  if (requests[ip].count > 100) {
    return res.status(429).json({ error: 'Too many requests' })
  }

  next()
})
Enter fullscreen mode Exit fullscreen mode

Problems:

  • Memory leak (entries never cleaned up)
  • No headers (clients don't know their limits)
  • No persistence (resets on server restart)
  • Race conditions in production

Using HitLimit (Recommended)

HitLimit handles all the edge cases:

import express from 'express'
import { hitlimit } from '@joint-ops/hitlimit'

const app = express()

// Zero-config: 100 requests per minute per IP
app.use(hitlimit())

// Or with custom limits
app.use(hitlimit({
  limit: 100,
  window: '15m',   // 15 minutes
  // Adds these headers to responses:
  // RateLimit-Limit: 100
  // RateLimit-Remaining: 95
  // RateLimit-Reset: 1706547600
}))

app.listen(3000)
Enter fullscreen mode Exit fullscreen mode

Protecting Specific Routes

Don't rate limit everything equally. Health checks and static assets don't need limits. Auth endpoints need strict limits.

// Global default
app.use(hitlimit({ limit: 1000, window: '1h' }))

// Strict limit on login
app.use('/auth/login', hitlimit({
  limit: 5,
  window: '15m',
  // After 5 failed logins, wait 15 minutes
}))

// Strict limit on registration
app.use('/auth/register', hitlimit({
  limit: 3,
  window: '1h'
}))

// Skip health checks entirely
app.use(hitlimit({
  skip: (req) => req.path === '/health'
}))
Enter fullscreen mode Exit fullscreen mode

Handling 429 Responses

When rate limited, return useful information:

app.use(hitlimit({
  limit: 100,
  window: '1h',
  response: (info) => ({
    error: 'RATE_LIMITED',
    message: 'Too many requests. Please slow down.',
    limit: info.limit,
    remaining: 0,
    resetAt: new Date(info.resetAt).toISOString(),
    retryAfter: info.resetIn  // seconds until reset
  })
}))
Enter fullscreen mode Exit fullscreen mode

Clients receive:

{
  "error": "RATE_LIMITED",
  "message": "Too many requests. Please slow down.",
  "limit": 100,
  "remaining": 0,
  "resetAt": "2026-01-30T15:00:00.000Z",
  "retryAfter": 1847
}
Enter fullscreen mode Exit fullscreen mode

Tiered Rate Limits

SaaS products need different limits for different users. HitLimit has this built-in:

app.use(hitlimit({
  tiers: {
    anonymous: { limit: 10, window: '1h' },
    free: { limit: 100, window: '1h' },
    pro: { limit: 5000, window: '1h' },
    enterprise: { limit: Infinity }
  },
  tier: (req) => {
    if (!req.user) return 'anonymous'
    return req.user.plan || 'free'
  }
}))
Enter fullscreen mode Exit fullscreen mode

Now your API automatically applies the right limits based on user context.

Custom Rate Limit Keys

By default, rate limiting uses IP address. But IPs aren't always the right key:

  • Shared IPs: Corporate networks might share one IP for thousands of users
  • API keys: You want to limit by API key, not IP
  • User accounts: Logged-in users should be limited by user ID
app.use(hitlimit({
  key: (req) => {
    // 1. API key takes precedence
    if (req.headers['x-api-key']) {
      return `api:${req.headers['x-api-key']}`
    }

    // 2. Logged-in user ID
    if (req.user?.id) {
      return `user:${req.user.id}`
    }

    // 3. Fall back to IP
    return `ip:${req.ip}`
  }
}))
Enter fullscreen mode Exit fullscreen mode

Distributed Rate Limiting

Single-server rate limiting is easy. Multiple servers is hard.

The Problem

You have 5 servers. Each has its own in-memory rate limiter set to 100 req/min.

Result? Users can make 500 req/min (100 per server).

Solution: Shared Store

Use Redis as a shared counter:

import { hitlimit } from '@joint-ops/hitlimit'
import { redisStore } from '@joint-ops/hitlimit/stores/redis'

app.use(hitlimit({
  limit: 100,
  window: '1m',
  store: redisStore({
    url: 'redis://localhost:6379',
    prefix: 'rl:'  // Key prefix in Redis
  })
}))
Enter fullscreen mode Exit fullscreen mode

All servers increment the same counter. True distributed rate limiting.

Handling Redis Failures

Redis goes down. What happens?

Fail-closed: Reject all requests. Safe but your API stops working.

Fail-open: Allow all requests. Risky but service continues.

HitLimit lets you decide:

app.use(hitlimit({
  store: redisStore({ url: 'redis://localhost:6379' }),
  onStoreError: (error, req) => {
    console.error('Redis error:', error)

    // Protect critical routes even if Redis is down
    if (req.path.startsWith('/admin')) return 'deny'

    // Allow other traffic to continue
    return 'allow'
  }
}))
Enter fullscreen mode Exit fullscreen mode

SQLite: Persistence Without Redis

Don't want to run Redis? SQLite gives you persistence without the ops overhead:

import { hitlimit } from '@joint-ops/hitlimit'
import { sqliteStore } from '@joint-ops/hitlimit/stores/sqlite'

app.use(hitlimit({
  store: sqliteStore({
    path: './rate-limits.db'  // File-based persistence
  })
}))
Enter fullscreen mode Exit fullscreen mode

Good for single-server deployments where you want limits to survive restarts.

Common Pitfalls

1. Trusting X-Forwarded-For Blindly

Behind a proxy, req.ip might be wrong. But don't blindly trust X-Forwarded-For—it can be spoofed.

// Configure Express to trust your proxy
app.set('trust proxy', 1)  // Trust first proxy

// Or be specific about which proxies to trust
app.set('trust proxy', 'loopback, 10.0.0.0/8')
Enter fullscreen mode Exit fullscreen mode

Only trust headers from proxies you control.

2. Not Handling Proxy Chains

Multiple proxies? The header looks like:

X-Forwarded-For: client, proxy1, proxy2
Enter fullscreen mode Exit fullscreen mode

The leftmost IP is the client. But if you trust the wrong number of proxies, you'll rate limit the wrong IP.

3. Overly Strict Limits

Too strict = frustrated legitimate users. Start generous and tighten based on data:

// Start here
{ limit: 1000, window: '1h' }

// Tighten if you see abuse
{ limit: 500, window: '1h' }

// But monitor for false positives
Enter fullscreen mode Exit fullscreen mode

4. Memory Leaks from Unbounded Stores

Custom in-memory stores must clean up old entries. Otherwise, memory grows forever.

HitLimit handles this automatically with setTimeout-based cleanup.

5. Not Rate Limiting Before Authentication

Attackers can brute-force login endpoints. Rate limit BEFORE authentication middleware:

// Good: Rate limit hits before auth check
app.use('/auth/login', hitlimit({ limit: 5, window: '15m' }))
app.post('/auth/login', authenticateUser)

// Bad: Auth runs first, then rate limit (attacker already hit your DB)
app.post('/auth/login', authenticateUser, hitlimit(...))
Enter fullscreen mode Exit fullscreen mode

Testing Your Rate Limiter

Verify Limits Work

// test/rate-limit.test.js
import request from 'supertest'
import app from '../app.js'

it('blocks after limit exceeded', async () => {
  // Make 100 requests (the limit)
  for (let i = 0; i < 100; i++) {
    await request(app).get('/api/test')
  }

  // Request 101 should be blocked
  const response = await request(app).get('/api/test')
  expect(response.status).toBe(429)
})
Enter fullscreen mode Exit fullscreen mode

Load Testing

Use tools like autocannon or k6:

npx autocannon -c 100 -d 30 http://localhost:3000/api/test
Enter fullscreen mode Exit fullscreen mode

Watch for:

  • 429 responses appearing at expected rates
  • Memory not growing unbounded
  • Response times staying consistent

Monitor in Production

Track these metrics:

  • 429 response rate
  • Rate limit header values
  • Store latency (if using Redis)
  • Memory usage of rate limit store

Quick Start with HitLimit

Installation

npm install @joint-ops/hitlimit
Enter fullscreen mode Exit fullscreen mode

Basic Setup

import express from 'express'
import { hitlimit } from '@joint-ops/hitlimit'

const app = express()

// Zero-config default: 100 req/min per IP
app.use(hitlimit())

// Your routes
app.get('/api/data', (req, res) => {
  res.json({ message: 'Hello!' })
})

app.listen(3000)
Enter fullscreen mode Exit fullscreen mode

That's it. Your API is now rate limited with sensible defaults.


Summary

Rate limiting is essential for production APIs. The key decisions:

  1. Algorithm: Start with fixed window. It's simple and works.
  2. Limits: Start generous, tighten based on data.
  3. Keys: IP for anonymous, user ID or API key for authenticated.
  4. Distribution: Use Redis for multi-server. SQLite for single-server persistence.
  5. Failures: Usually fail-open. Your API working matters more than perfect rate limiting.

HitLimit handles all of this in 7KB with zero configuration required.


Links


Written by Muhammad Ali, Full-Stack & Web3 Gaming Engineer at JointOps. We build production-grade APIs and open-source tools.

Tags: #nodejs #ratelimiting #api #backend #javascript #expressjs #tutorial #security

Top comments (0)