TL;DR: Rate limiting protects your API from abuse and ensures fair usage. This guide covers algorithms, implementation patterns, distributed systems, and common pitfalls—with practical code examples.
Every production API needs rate limiting. Without it, a single bad actor (or a bug in a client) can take down your entire service. This guide covers everything you need to implement rate limiting correctly in Node.js.
Why Rate Limiting Matters
Rate limiting serves three purposes:
1. Protection Against Abuse
Bots, scrapers, and attackers can flood your API with requests. Rate limiting caps how much damage they can do.
2. Fair Resource Allocation
Without limits, one aggressive client can starve others. Rate limiting ensures everyone gets a fair share.
3. API Monetization
SaaS products use tiered rate limits to differentiate pricing tiers. Free users get 100 req/hour, paid users get 10,000.
Real Examples of What Goes Wrong
- Twitter (2013): API abuse caused widespread outages. They introduced aggressive rate limiting.
- GitHub: Rate limits all API calls. Unauthenticated: 60 req/hour. Authenticated: 5,000 req/hour.
- Stripe: 100 read operations/sec, 25 write operations/sec per API key.
If these companies need rate limiting, so do you.
Rate Limiting Algorithms Explained
Three main algorithms power rate limiters. Each has tradeoffs.
Fixed Window
Divide time into fixed slots (0:00-1:00, 1:00-2:00). Count requests per slot. Reset count at slot boundary.
Window 1 (0:00-1:00): ████████░░ 80/100 ✓
Window 2 (1:00-2:00): ██████████ 100/100 ✓
Window 3 (2:00-3:00): ███░░░░░░░ 30/100 ✓
Pros:
- Simple to implement
- O(1) time and space complexity
- Easy to understand and debug
Cons:
- Burst at boundaries (100 at 0:59 + 100 at 1:01 = 200 in 2 seconds)
Best for: Most general-purpose APIs where exact precision isn't critical.
Sliding Window Log
Store the timestamp of every request. Count requests in the last N seconds.
Requests: [0:15, 0:22, 0:45, 0:58, 1:03, 1:15]
Window (last 60s at 1:20): [0:22, 0:45, 0:58, 1:03, 1:15] = 5 requests
Pros:
- Precise rate limiting
- No burst at boundaries
- Smooth traffic shaping
Cons:
- Memory grows with traffic (store every timestamp)
- O(n) lookups
- Not practical for high-traffic APIs
Best for: Low-traffic, precision-critical scenarios (billing, security).
Sliding Window Counter
Hybrid approach. Use weighted average of current and previous window counts.
Previous window (0:00-1:00): 80 requests
Current window (1:00-2:00): 30 requests so far
Time into current window: 20 seconds (33%)
Estimated count = (80 * 0.67) + 30 = 83.6 requests
Pros:
- Smooths the burst problem
- O(1) operations
- Good accuracy without storing every timestamp
Cons:
- Approximation, not exact
- Slightly more complex to implement
Best for: High-traffic APIs needing smoother limiting than fixed window.
Token Bucket
Tokens added at fixed rate. Each request consumes one token. Requests denied when bucket empty.
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Time 0: 100 tokens
Burst of 100 requests: 0 tokens
After 5 seconds: 50 tokens (refilled)
Pros:
- Allows controlled bursts
- Smooth rate limiting
- Flexible configuration
Cons:
- More complex to implement
- Requires tracking token count and last refill time
Best for: APIs that want to allow short bursts while limiting sustained traffic.
Decision Tree
Need precision? → Sliding Window Log
High traffic + allow bursts? → Token Bucket
High traffic + simple? → Fixed Window
High traffic + smooth? → Sliding Window Counter
Most APIs? Start with Fixed Window. It's simple, fast, and good enough.
Implementing Rate Limiting in Node.js
The Naive Approach (Don't Do This)
// Don't do this in production
const requests = {}
app.use((req, res, next) => {
const ip = req.ip
const now = Date.now()
if (!requests[ip]) {
requests[ip] = { count: 1, startTime: now }
} else if (now - requests[ip].startTime > 60000) {
requests[ip] = { count: 1, startTime: now }
} else {
requests[ip].count++
}
if (requests[ip].count > 100) {
return res.status(429).json({ error: 'Too many requests' })
}
next()
})
Problems:
- Memory leak (entries never cleaned up)
- No headers (clients don't know their limits)
- No persistence (resets on server restart)
- Race conditions in production
Using HitLimit (Recommended)
HitLimit handles all the edge cases:
import express from 'express'
import { hitlimit } from '@joint-ops/hitlimit'
const app = express()
// Zero-config: 100 requests per minute per IP
app.use(hitlimit())
// Or with custom limits
app.use(hitlimit({
limit: 100,
window: '15m', // 15 minutes
// Adds these headers to responses:
// RateLimit-Limit: 100
// RateLimit-Remaining: 95
// RateLimit-Reset: 1706547600
}))
app.listen(3000)
Protecting Specific Routes
Don't rate limit everything equally. Health checks and static assets don't need limits. Auth endpoints need strict limits.
// Global default
app.use(hitlimit({ limit: 1000, window: '1h' }))
// Strict limit on login
app.use('/auth/login', hitlimit({
limit: 5,
window: '15m',
// After 5 failed logins, wait 15 minutes
}))
// Strict limit on registration
app.use('/auth/register', hitlimit({
limit: 3,
window: '1h'
}))
// Skip health checks entirely
app.use(hitlimit({
skip: (req) => req.path === '/health'
}))
Handling 429 Responses
When rate limited, return useful information:
app.use(hitlimit({
limit: 100,
window: '1h',
response: (info) => ({
error: 'RATE_LIMITED',
message: 'Too many requests. Please slow down.',
limit: info.limit,
remaining: 0,
resetAt: new Date(info.resetAt).toISOString(),
retryAfter: info.resetIn // seconds until reset
})
}))
Clients receive:
{
"error": "RATE_LIMITED",
"message": "Too many requests. Please slow down.",
"limit": 100,
"remaining": 0,
"resetAt": "2026-01-30T15:00:00.000Z",
"retryAfter": 1847
}
Tiered Rate Limits
SaaS products need different limits for different users. HitLimit has this built-in:
app.use(hitlimit({
tiers: {
anonymous: { limit: 10, window: '1h' },
free: { limit: 100, window: '1h' },
pro: { limit: 5000, window: '1h' },
enterprise: { limit: Infinity }
},
tier: (req) => {
if (!req.user) return 'anonymous'
return req.user.plan || 'free'
}
}))
Now your API automatically applies the right limits based on user context.
Custom Rate Limit Keys
By default, rate limiting uses IP address. But IPs aren't always the right key:
- Shared IPs: Corporate networks might share one IP for thousands of users
- API keys: You want to limit by API key, not IP
- User accounts: Logged-in users should be limited by user ID
app.use(hitlimit({
key: (req) => {
// 1. API key takes precedence
if (req.headers['x-api-key']) {
return `api:${req.headers['x-api-key']}`
}
// 2. Logged-in user ID
if (req.user?.id) {
return `user:${req.user.id}`
}
// 3. Fall back to IP
return `ip:${req.ip}`
}
}))
Distributed Rate Limiting
Single-server rate limiting is easy. Multiple servers is hard.
The Problem
You have 5 servers. Each has its own in-memory rate limiter set to 100 req/min.
Result? Users can make 500 req/min (100 per server).
Solution: Shared Store
Use Redis as a shared counter:
import { hitlimit } from '@joint-ops/hitlimit'
import { redisStore } from '@joint-ops/hitlimit/stores/redis'
app.use(hitlimit({
limit: 100,
window: '1m',
store: redisStore({
url: 'redis://localhost:6379',
prefix: 'rl:' // Key prefix in Redis
})
}))
All servers increment the same counter. True distributed rate limiting.
Handling Redis Failures
Redis goes down. What happens?
Fail-closed: Reject all requests. Safe but your API stops working.
Fail-open: Allow all requests. Risky but service continues.
HitLimit lets you decide:
app.use(hitlimit({
store: redisStore({ url: 'redis://localhost:6379' }),
onStoreError: (error, req) => {
console.error('Redis error:', error)
// Protect critical routes even if Redis is down
if (req.path.startsWith('/admin')) return 'deny'
// Allow other traffic to continue
return 'allow'
}
}))
SQLite: Persistence Without Redis
Don't want to run Redis? SQLite gives you persistence without the ops overhead:
import { hitlimit } from '@joint-ops/hitlimit'
import { sqliteStore } from '@joint-ops/hitlimit/stores/sqlite'
app.use(hitlimit({
store: sqliteStore({
path: './rate-limits.db' // File-based persistence
})
}))
Good for single-server deployments where you want limits to survive restarts.
Common Pitfalls
1. Trusting X-Forwarded-For Blindly
Behind a proxy, req.ip might be wrong. But don't blindly trust X-Forwarded-For—it can be spoofed.
// Configure Express to trust your proxy
app.set('trust proxy', 1) // Trust first proxy
// Or be specific about which proxies to trust
app.set('trust proxy', 'loopback, 10.0.0.0/8')
Only trust headers from proxies you control.
2. Not Handling Proxy Chains
Multiple proxies? The header looks like:
X-Forwarded-For: client, proxy1, proxy2
The leftmost IP is the client. But if you trust the wrong number of proxies, you'll rate limit the wrong IP.
3. Overly Strict Limits
Too strict = frustrated legitimate users. Start generous and tighten based on data:
// Start here
{ limit: 1000, window: '1h' }
// Tighten if you see abuse
{ limit: 500, window: '1h' }
// But monitor for false positives
4. Memory Leaks from Unbounded Stores
Custom in-memory stores must clean up old entries. Otherwise, memory grows forever.
HitLimit handles this automatically with setTimeout-based cleanup.
5. Not Rate Limiting Before Authentication
Attackers can brute-force login endpoints. Rate limit BEFORE authentication middleware:
// Good: Rate limit hits before auth check
app.use('/auth/login', hitlimit({ limit: 5, window: '15m' }))
app.post('/auth/login', authenticateUser)
// Bad: Auth runs first, then rate limit (attacker already hit your DB)
app.post('/auth/login', authenticateUser, hitlimit(...))
Testing Your Rate Limiter
Verify Limits Work
// test/rate-limit.test.js
import request from 'supertest'
import app from '../app.js'
it('blocks after limit exceeded', async () => {
// Make 100 requests (the limit)
for (let i = 0; i < 100; i++) {
await request(app).get('/api/test')
}
// Request 101 should be blocked
const response = await request(app).get('/api/test')
expect(response.status).toBe(429)
})
Load Testing
Use tools like autocannon or k6:
npx autocannon -c 100 -d 30 http://localhost:3000/api/test
Watch for:
- 429 responses appearing at expected rates
- Memory not growing unbounded
- Response times staying consistent
Monitor in Production
Track these metrics:
- 429 response rate
- Rate limit header values
- Store latency (if using Redis)
- Memory usage of rate limit store
Quick Start with HitLimit
Installation
npm install @joint-ops/hitlimit
Basic Setup
import express from 'express'
import { hitlimit } from '@joint-ops/hitlimit'
const app = express()
// Zero-config default: 100 req/min per IP
app.use(hitlimit())
// Your routes
app.get('/api/data', (req, res) => {
res.json({ message: 'Hello!' })
})
app.listen(3000)
That's it. Your API is now rate limited with sensible defaults.
Summary
Rate limiting is essential for production APIs. The key decisions:
- Algorithm: Start with fixed window. It's simple and works.
- Limits: Start generous, tighten based on data.
- Keys: IP for anonymous, user ID or API key for authenticated.
- Distribution: Use Redis for multi-server. SQLite for single-server persistence.
- Failures: Usually fail-open. Your API working matters more than perfect rate limiting.
HitLimit handles all of this in 7KB with zero configuration required.
Links
- GitHub: github.com/jointops/hitlimit-monorepo
- NPM: @joint-ops/hitlimit
- Documentation: hitlimit.jointops.dev
Written by Muhammad Ali, Full-Stack & Web3 Gaming Engineer at JointOps. We build production-grade APIs and open-source tools.
Tags: #nodejs #ratelimiting #api #backend #javascript #expressjs #tutorial #security
Top comments (0)