DEV Community

Bill Tu
Bill Tu

Posted on

Building a High-Performance Rate Limiter for Node.js: Architecture, Algorithms, and Benchmarks

Rate limiting is one of those things every production API needs but few teams get right. Most reach for an Express middleware, bolt it on, and move on. That works — until you're running multiple servers, or your rate limiter itself becomes the bottleneck.

I built nodejs-rate-limiter to solve this properly: two algorithms (Token Bucket and Sliding Window), in-memory and Redis-backed stores, Express middleware, and performance that's 10x faster than the popular alternatives. This article walks through the design decisions, algorithm internals, and the benchmarks that back it all up.

The Problem with Existing Solutions

The most widely used Node.js rate limiter is express-rate-limit. It's simple, well-documented, and gets the job done for small apps. But it has real limitations:

  1. Single-process only — no built-in way to share state across servers
  2. Fixed window algorithm — requests at window boundaries can effectively double the allowed rate
  3. Middleware coupling — the rate limiting logic is tangled with Express, making it hard to use in other contexts (WebSocket handlers, queue consumers, CLI tools)
  4. Performance ceiling — at ~185K ops/sec, it becomes a measurable cost in high-throughput systems

I wanted something that separates the rate limiting logic from the transport layer, supports distributed deployments out of the box, and doesn't compromise on speed.

Architecture

The design follows a clean separation of concerns:

┌─────────────────────────────────────────────┐
│              RateLimiter (API)               │
│  .consume()  .reset()  .middleware()         │
└──────────────────┬──────────────────────────┘
                   │
          ┌────────┴────────┐
          │                 │
   ┌──────▼──────┐   ┌─────▼──────┐
   │ MemoryStore │   │ RedisStore  │
   │  In-process │   │ (Lua scripts│
   │             │   │  atomic ops)│
   └──────┬──────┘   └─────┬──────┘
          │                 │
   ┌──────┴──────┐   ┌─────┴──────┐
   │  Algorithm  │   │  Algorithm  │
   │  (TypeScript│   │  (in Lua)   │
   │  in-process)│   │  on Redis)  │
   └─────────────┘   └────────────┘
Enter fullscreen mode Exit fullscreen mode

Three layers:

  • RateLimiter — the public API. Consumers call .consume(key) and get back a result. They don't know or care whether state lives in memory or Redis.
  • Store — implements the Store interface (consume, reset, close). MemoryStore delegates to in-process algorithm classes. RedisStore runs equivalent logic as Lua scripts.
  • Algorithm — the actual math. Token Bucket and Sliding Window, implemented both in TypeScript (for memory) and Lua (for Redis).

This means you can swap from in-memory to Redis by adding a single client property to the constructor options. No code changes, no different API.

// In-memory
const limiter = new RateLimiter({ limit: 100, window: 60_000 });

// Redis — same API, just add the client
const limiter = new RateLimiter({ limit: 100, window: 60_000, client: redis });
Enter fullscreen mode Exit fullscreen mode

Algorithm Deep Dive

Token Bucket

The Token Bucket is the simpler of the two. Each key gets a bucket that holds up to maxTokens tokens. Tokens refill continuously over time. Each request consumes one token. If the bucket is empty, the request is rejected.

The implementation stores two values per key: the current token count and the last refill timestamp.

// Refill tokens based on elapsed time
const elapsed = now - bucket.lastRefill;
const tokensToAdd = (elapsed / this.refillIntervalMs) * this.maxTokens;
bucket.tokens = Math.min(this.maxTokens, bucket.tokens + tokensToAdd);
bucket.lastRefill = now;
Enter fullscreen mode Exit fullscreen mode

This is a lazy refill — tokens aren't added on a timer. Instead, every time a request comes in, we calculate how many tokens should have been added since the last request. This means zero background work and O(1) time per operation.

The tradeoff: Token Bucket allows bursts. If a bucket has been idle and accumulated full capacity, a client can send maxTokens requests instantly. This is a feature for traffic shaping, but a liability if you need strict per-second enforcement.

Sliding Window Counter

The naive sliding window approach stores every request timestamp and filters out expired ones on each check. This is O(n) in the number of requests within the window — fine for small limits, catastrophic for high-throughput scenarios. With a limit of 1,000,000 requests per minute, you'd be scanning a million-element array on every call.

The solution is the Sliding Window Counter algorithm. Instead of tracking individual timestamps, it maintains two counters: one for the current sub-window and one for the previous sub-window. The estimated request count is a weighted blend:

// Weight = how much of the previous window still overlaps
const elapsedInCurrent = now - state.currStart;
const weight = Math.max(0, 1 - elapsedInCurrent / this.windowMs);
const estimatedCount = state.prevCount * weight + state.currCount;
Enter fullscreen mode Exit fullscreen mode

If we're 30% into the current window, the previous window contributes 70% of its count. This gives O(1) time and O(1) space per key, with accuracy that's close enough to exact sliding window for all practical purposes.

The window rotation logic handles three cases:

  1. Within current window — just update the counter
  2. One window elapsed — current becomes previous, reset current
  3. Two or more windows elapsed — everything expired, reset both
const elapsed = now - state.currStart;
if (elapsed >= this.windowMs * 2) {
  state.prevCount = 0;
  state.currCount = 0;
  state.currStart = now;
} else if (elapsed >= this.windowMs) {
  state.prevCount = state.currCount;
  state.currCount = 0;
  state.currStart = state.currStart + this.windowMs;
}
Enter fullscreen mode Exit fullscreen mode

This is the same approach used by Cloudflare and other CDN providers at scale.

Redis: Going Distributed with Lua Scripts

In-memory rate limiting breaks down the moment you have more than one server. If your API runs behind a load balancer with 4 instances, each instance tracks its own counters — a client could effectively get 4x the intended rate limit.

The Redis store solves this by moving the algorithm logic into Lua scripts that execute atomically on the Redis server. A single EVAL call does the entire read-check-write cycle with no race conditions.

Sliding Window (Redis)

The Redis sliding window uses a sorted set (ZSET) where each member is a unique request ID and the score is the timestamp:

local windowStart = now - window
redis.call('ZREMRANGEBYSCORE', key, '-inf', windowStart)  -- evict expired
local count = redis.call('ZCARD', key)                     -- count remaining

if count < limit then
  redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
  redis.call('PEXPIRE', key, window)
  return {1, limit - count - 1, 0}  -- allowed
end
return {0, 0, retryAfter}            -- blocked
Enter fullscreen mode Exit fullscreen mode

Note the random suffix on the member value (now .. '-' .. math.random(1000000)). Sorted set members must be unique — without this, two requests arriving at the exact same millisecond would collide and only one would be recorded.

Token Bucket (Redis)

The Redis token bucket uses a hash with two fields (tokens and lastRefill), mirroring the in-memory implementation:

local data = redis.call('HMGET', key, 'tokens', 'lastRefill')
local tokens = tonumber(data[1])
local lastRefill = tonumber(data[2])

-- Lazy refill
local elapsed = now - lastRefill
local tokensToAdd = (elapsed / refillInterval) * maxTokens
tokens = math.min(maxTokens, tokens + tokensToAdd)
Enter fullscreen mode Exit fullscreen mode

Both scripts set PEXPIRE on the key so Redis automatically cleans up idle keys. No background garbage collection needed.

Why Lua?

Redis Lua scripts are atomic — they block the Redis event loop for the duration of the script. This means no other client can read or write the key between our read and write. It's the equivalent of a database transaction, but with zero overhead beyond the script execution itself.

The alternative — using MULTI/EXEC transactions or optimistic locking with WATCH — requires multiple round trips and retry logic. A single EVAL call is both simpler and faster.

Express Middleware

The library includes a middleware factory that handles the HTTP ceremony:

app.use(limiter.middleware({
  keyFn: (req) => req.headers['x-api-key'] || req.ip,
  onLimited: (req, res) => {
    res.status(429).json({ error: 'Slow down!' });
  },
}));
Enter fullscreen mode Exit fullscreen mode

It sets standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) automatically. But the middleware is just a thin wrapper — the core consume() API works anywhere: WebSocket handlers, GraphQL resolvers, queue consumers, CLI tools.

Benchmarks

Performance claims without numbers are just marketing. Here are the actual results from a sustained 5-second test on Node.js v24:

================================================================================
  BENCHMARK RESULTS
================================================================================
Name                                     ops/sec   avg (μs)   p99 (μs)
--------------------------------------------------------------------------------
Sliding Window (memory)              2,347,456          0          1
Token Bucket (memory)                2,213,067          0          1
Sliding Window (multi-key)           1,499,662          1          1
Token Bucket (multi-key)             1,514,288          1          1
Sliding Window (10x concurrent)        177,720          5         14
================================================================================
Enter fullscreen mode Exit fullscreen mode

Both algorithms sustain over 2 million operations per second on a single key, with sub-microsecond average latency. The multi-key scenario (10,000 distinct keys) drops to ~1.5M ops/sec due to Map lookup overhead, which is still far beyond what any single server would need.

vs express-rate-limit

======================================================================
  COMPARISON RESULTS
======================================================================
Library                                        ops/sec   avg (μs)
----------------------------------------------------------------------
nodejs-rate-limiter (sliding-window)         1,968,296          0
nodejs-rate-limiter (token-bucket)           1,813,408          0
express-rate-limit                             185,412          5
======================================================================
Enter fullscreen mode Exit fullscreen mode

~10x faster. The gap comes from three things:

  1. O(1) algorithms — the sliding window counter does constant work per request, no array scanning
  2. Zero allocation hot path — no objects or arrays created per consume() call, just arithmetic on existing Map entries
  3. No middleware overhead — the core logic doesn't touch HTTP concepts; express-rate-limit does validation, header parsing, and response formatting on every call

A Note on Methodology

These benchmarks measure raw algorithm throughput, not end-to-end HTTP performance. In a real Express app, the rate limiter is a tiny fraction of total request time — network I/O, JSON parsing, database queries, and response serialization dominate. The 10x difference matters most in high-throughput scenarios where you're processing thousands of requests per second and every microsecond counts.

The benchmark code is included in the repository (npm run benchmark and npm run benchmark:compare). Run it on your own hardware to get numbers that reflect your environment.

Design Decisions Worth Noting

The Store interface doesn't own the Redis connection

async close(): Promise<void> {
  // Don't disconnect — the user owns the client lifecycle
}
Enter fullscreen mode Exit fullscreen mode

The RedisStore accepts an ioredis client but never disconnects it. The caller created the connection, the caller should close it. This avoids surprises in applications where the same Redis client is shared across multiple subsystems.

ioredis is an optional peer dependency

If you only need in-memory rate limiting, you don't install ioredis. The library has zero runtime dependencies in memory-only mode. The store selection happens at construction time based on whether a client property is present in the options.

The result object is always the same shape

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  limit: number;
  resetAt: number;
  retryAfter: number;
}
Enter fullscreen mode Exit fullscreen mode

Whether you're using Token Bucket or Sliding Window, memory or Redis, the result is identical. retryAfter is 0 when allowed, and a positive millisecond value when blocked. This makes it trivial to switch algorithms without changing consuming code.

When to Use Which Algorithm

Scenario Algorithm Why
API rate limiting Sliding Window Prevents boundary bursts, fair distribution
Login attempt protection Sliding Window Strict count within any rolling window
Traffic shaping Token Bucket Allows natural bursts, smooth average rate
Webhook delivery Token Bucket Burst tolerance for batch events
Multi-tenant SaaS Either Depends on your fairness requirements

Getting Started

npm install nodejs-rate-limiter
Enter fullscreen mode Exit fullscreen mode
import { RateLimiter } from 'nodejs-rate-limiter';

// 100 requests per minute, sliding window
const limiter = new RateLimiter({
  algorithm: 'sliding-window',
  limit: 100,
  window: 60_000,
});

const result = await limiter.consume('user:123');
console.log(result.allowed, result.remaining);
Enter fullscreen mode Exit fullscreen mode

The full source, examples, and benchmarks are on GitHub: iwtxokhtd83/nodejs-rate-limiter.


If you found this useful, give the repo a star. Contributions and issues are welcome.

Top comments (0)