Bill Tu

Posted on Mar 20

Building a High-Performance Rate Limiter for Node.js: Architecture, Algorithms, and Benchmarks

#javascript #webdev #node #typescript

Rate limiting is one of those things every production API needs but few teams get right. Most reach for an Express middleware, bolt it on, and move on. That works — until you're running multiple servers, or your rate limiter itself becomes the bottleneck.

I built nodejs-rate-limiter to solve this properly: two algorithms (Token Bucket and Sliding Window), in-memory and Redis-backed stores, Express middleware, and performance that's 10x faster than the popular alternatives. This article walks through the design decisions, algorithm internals, and the benchmarks that back it all up.

The Problem with Existing Solutions

The most widely used Node.js rate limiter is express-rate-limit. It's simple, well-documented, and gets the job done for small apps. But it has real limitations:

Single-process only — no built-in way to share state across servers
Fixed window algorithm — requests at window boundaries can effectively double the allowed rate
Middleware coupling — the rate limiting logic is tangled with Express, making it hard to use in other contexts (WebSocket handlers, queue consumers, CLI tools)
Performance ceiling — at ~185K ops/sec, it becomes a measurable cost in high-throughput systems

I wanted something that separates the rate limiting logic from the transport layer, supports distributed deployments out of the box, and doesn't compromise on speed.

Architecture

The design follows a clean separation of concerns:

┌─────────────────────────────────────────────┐
│              RateLimiter (API)               │
│  .consume()  .reset()  .middleware()         │
└──────────────────┬──────────────────────────┘
                   │
          ┌────────┴────────┐
          │                 │
   ┌──────▼──────┐   ┌─────▼──────┐
   │ MemoryStore │   │ RedisStore  │
   │  In-process │   │ (Lua scripts│
   │             │   │  atomic ops)│
   └──────┬──────┘   └─────┬──────┘
          │                 │
   ┌──────┴──────┐   ┌─────┴──────┐
   │  Algorithm  │   │  Algorithm  │
   │  (TypeScript│   │  (in Lua)   │
   │  in-process)│   │  on Redis)  │
   └─────────────┘   └────────────┘

Three layers:

RateLimiter — the public API. Consumers call .consume(key) and get back a result. They don't know or care whether state lives in memory or Redis.
Store — implements the Store interface (consume, reset, close). MemoryStore delegates to in-process algorithm classes. RedisStore runs equivalent logic as Lua scripts.
Algorithm — the actual math. Token Bucket and Sliding Window, implemented both in TypeScript (for memory) and Lua (for Redis).

This means you can swap from in-memory to Redis by adding a single client property to the constructor options. No code changes, no different API.

// In-memory
const limiter = new RateLimiter({ limit: 100, window: 60_000 });

// Redis — same API, just add the client
const limiter = new RateLimiter({ limit: 100, window: 60_000, client: redis });

Algorithm Deep Dive

Token Bucket

The Token Bucket is the simpler of the two. Each key gets a bucket that holds up to maxTokens tokens. Tokens refill continuously over time. Each request consumes one token. If the bucket is empty, the request is rejected.

The implementation stores two values per key: the current token count and the last refill timestamp.

// Refill tokens based on elapsed time
const elapsed = now - bucket.lastRefill;
const tokensToAdd = (elapsed / this.refillIntervalMs) * this.maxTokens;
bucket.tokens = Math.min(this.maxTokens, bucket.tokens + tokensToAdd);
bucket.lastRefill = now;

This is a lazy refill — tokens aren't added on a timer. Instead, every time a request comes in, we calculate how many tokens should have been added since the last request. This means zero background work and O(1) time per operation.

The tradeoff: Token Bucket allows bursts. If a bucket has been idle and accumulated full capacity, a client can send maxTokens requests instantly. This is a feature for traffic shaping, but a liability if you need strict per-second enforcement.

Sliding Window Counter

The naive sliding window approach stores every request timestamp and filters out expired ones on each check. This is O(n) in the number of requests within the window — fine for small limits, catastrophic for high-throughput scenarios. With a limit of 1,000,000 requests per minute, you'd be scanning a million-element array on every call.

The solution is the Sliding Window Counter algorithm. Instead of tracking individual timestamps, it maintains two counters: one for the current sub-window and one for the previous sub-window. The estimated request count is a weighted blend:

// Weight = how much of the previous window still overlaps
const elapsedInCurrent = now - state.currStart;
const weight = Math.max(0, 1 - elapsedInCurrent / this.windowMs);
const estimatedCount = state.prevCount * weight + state.currCount;

If we're 30% into the current window, the previous window contributes 70% of its count. This gives O(1) time and O(1) space per key, with accuracy that's close enough to exact sliding window for all practical purposes.

The window rotation logic handles three cases:

Within current window — just update the counter
One window elapsed — current becomes previous, reset current
Two or more windows elapsed — everything expired, reset both

const elapsed = now - state.currStart;
if (elapsed >= this.windowMs * 2) {
  state.prevCount = 0;
  state.currCount = 0;
  state.currStart = now;
} else if (elapsed >= this.windowMs) {
  state.prevCount = state.currCount;
  state.currCount = 0;
  state.currStart = state.currStart + this.windowMs;
}

This is the same approach used by Cloudflare and other CDN providers at scale.

Redis: Going Distributed with Lua Scripts

In-memory rate limiting breaks down the moment you have more than one server. If your API runs behind a load balancer with 4 instances, each instance tracks its own counters — a client could effectively get 4x the intended rate limit.

The Redis store solves this by moving the algorithm logic into Lua scripts that execute atomically on the Redis server. A single EVAL call does the entire read-check-write cycle with no race conditions.

Sliding Window (Redis)

The Redis sliding window uses a sorted set (ZSET) where each member is a unique request ID and the score is the timestamp:

local windowStart = now - window
redis.call('ZREMRANGEBYSCORE', key, '-inf', windowStart)  -- evict expired
local count = redis.call('ZCARD', key)                     -- count remaining

if count < limit then
  redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
  redis.call('PEXPIRE', key, window)
  return {1, limit - count - 1, 0}  -- allowed
end
return {0, 0, retryAfter}            -- blocked

Note the random suffix on the member value (now .. '-' .. math.random(1000000)). Sorted set members must be unique — without this, two requests arriving at the exact same millisecond would collide and only one would be recorded.

Token Bucket (Redis)

The Redis token bucket uses a hash with two fields (tokens and lastRefill), mirroring the in-memory implementation:

local data = redis.call('HMGET', key, 'tokens', 'lastRefill')
local tokens = tonumber(data[1])
local lastRefill = tonumber(data[2])

-- Lazy refill
local elapsed = now - lastRefill
local tokensToAdd = (elapsed / refillInterval) * maxTokens
tokens = math.min(maxTokens, tokens + tokensToAdd)

Both scripts set PEXPIRE on the key so Redis automatically cleans up idle keys. No background garbage collection needed.

Why Lua?

Redis Lua scripts are atomic — they block the Redis event loop for the duration of the script. This means no other client can read or write the key between our read and write. It's the equivalent of a database transaction, but with zero overhead beyond the script execution itself.

The alternative — using MULTI/EXEC transactions or optimistic locking with WATCH — requires multiple round trips and retry logic. A single EVAL call is both simpler and faster.

Express Middleware

The library includes a middleware factory that handles the HTTP ceremony:

app.use(limiter.middleware({
  keyFn: (req) => req.headers['x-api-key'] || req.ip,
  onLimited: (req, res) => {
    res.status(429).json({ error: 'Slow down!' });
  },
}));

It sets standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After) automatically. But the middleware is just a thin wrapper — the core consume() API works anywhere: WebSocket handlers, GraphQL resolvers, queue consumers, CLI tools.

Benchmarks

Performance claims without numbers are just marketing. Here are the actual results from a sustained 5-second test on Node.js v24:

================================================================================
  BENCHMARK RESULTS
================================================================================
Name                                     ops/sec   avg (μs)   p99 (μs)
--------------------------------------------------------------------------------
Sliding Window (memory)              2,347,456          0          1
Token Bucket (memory)                2,213,067          0          1
Sliding Window (multi-key)           1,499,662          1          1
Token Bucket (multi-key)             1,514,288          1          1
Sliding Window (10x concurrent)        177,720          5         14
================================================================================

Both algorithms sustain over 2 million operations per second on a single key, with sub-microsecond average latency. The multi-key scenario (10,000 distinct keys) drops to ~1.5M ops/sec due to Map lookup overhead, which is still far beyond what any single server would need.

vs express-rate-limit

======================================================================
  COMPARISON RESULTS
======================================================================
Library                                        ops/sec   avg (μs)
----------------------------------------------------------------------
nodejs-rate-limiter (sliding-window)         1,968,296          0
nodejs-rate-limiter (token-bucket)           1,813,408          0
express-rate-limit                             185,412          5
======================================================================

~10x faster. The gap comes from three things:

O(1) algorithms — the sliding window counter does constant work per request, no array scanning
Zero allocation hot path — no objects or arrays created per consume() call, just arithmetic on existing Map entries
No middleware overhead — the core logic doesn't touch HTTP concepts; express-rate-limit does validation, header parsing, and response formatting on every call

A Note on Methodology

These benchmarks measure raw algorithm throughput, not end-to-end HTTP performance. In a real Express app, the rate limiter is a tiny fraction of total request time — network I/O, JSON parsing, database queries, and response serialization dominate. The 10x difference matters most in high-throughput scenarios where you're processing thousands of requests per second and every microsecond counts.

The benchmark code is included in the repository (npm run benchmark and npm run benchmark:compare). Run it on your own hardware to get numbers that reflect your environment.

Design Decisions Worth Noting

The Store interface doesn't own the Redis connection

async close(): Promise<void> {
  // Don't disconnect — the user owns the client lifecycle
}

The RedisStore accepts an ioredis client but never disconnects it. The caller created the connection, the caller should close it. This avoids surprises in applications where the same Redis client is shared across multiple subsystems.

ioredis is an optional peer dependency

If you only need in-memory rate limiting, you don't install ioredis. The library has zero runtime dependencies in memory-only mode. The store selection happens at construction time based on whether a client property is present in the options.

The result object is always the same shape

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  limit: number;
  resetAt: number;
  retryAfter: number;
}

Whether you're using Token Bucket or Sliding Window, memory or Redis, the result is identical. retryAfter is 0 when allowed, and a positive millisecond value when blocked. This makes it trivial to switch algorithms without changing consuming code.

When to Use Which Algorithm

Scenario	Algorithm	Why
API rate limiting	Sliding Window	Prevents boundary bursts, fair distribution
Login attempt protection	Sliding Window	Strict count within any rolling window
Traffic shaping	Token Bucket	Allows natural bursts, smooth average rate
Webhook delivery	Token Bucket	Burst tolerance for batch events
Multi-tenant SaaS	Either	Depends on your fairness requirements

Getting Started

npm install nodejs-rate-limiter

import { RateLimiter } from 'nodejs-rate-limiter';

// 100 requests per minute, sliding window
const limiter = new RateLimiter({
  algorithm: 'sliding-window',
  limit: 100,
  window: 60_000,
});

const result = await limiter.consume('user:123');
console.log(result.allowed, result.remaining);

The full source, examples, and benchmarks are on GitHub: iwtxokhtd83/nodejs-rate-limiter.

If you found this useful, give the repo a star. Contributions and issues are welcome.

DEV Community