In backend development, rate limiting is a strategy for controlling how many requests clients can make to an API endpoint within a given time window before being rejected.
Rate limiting is essential for:
- Protection from abuse / DoS — A client or attacker might try to flood your API and tie up resources.
- Fairness & QoS — You ensure clients can’t hog all your capacity, hijacking performance.
- Cost control — External services might have rate limits or incur cost per request.
- User experience — Better to throttle politely than crash or slow to a crawl.
But rate limiting isn't just about stopping "bad guys" — it's also about making your API predictable and reliable under load.
In this post, I'll walk you through 3 classic rate-limiting strategies called: fixed window, sliding window, and token bucket.
💡 New to backend programming in Node.js?
Check out the Learn Backend Mastery Program — a complete zero-to-hired roadmap that takes you from complete beginner to job-ready Node.js backend developer in 12 months.
Strategy 1: Fixed Window
The fixed window strategy consists in dividing time into fixed slots (e.g., 1 minute) and counting how many requests a client makes within this slot.
Once the client exceeds the defined maximum amount of requests per slot, all further requests are blocked until the slot resets.
For example, a fixed window of 10 requests per minute means that a client can only make 10 requests between 12:00:00–12:00:59
, then another 10 between 12:01:00–12:01:59
, and so on.
This strategy is often used for internal/admin endpoints.
Pros and cons
✅ Very simple to implement.
✅ Minimal memory usage.
❌ Bursty at window boundaries — a client could send 10 requests at 12:00:59
and another 10 at 12:01:00
, effectively making 20 requests in 2 seconds.
Implementation example
Here's an implementation example of the fixed window strategy using a function that takes as parameters the maximum amount of requests a client can make (limit
) and the duration in milliseconds of the time slot (windowMs
), and returns an Express middleware.
function fixedWindow(limit, windowMs) {
// Create an in-memory store, where:
// - key: the client IP address
// - value: { count: the number of requests in current window, windowStart: the window start timestamp }
const store = new Map();
// Return an Express middleware
return (req, res, next) => {
// The client's IP address
const key = req.ip;
// The current time in milliseconds
const now = Date.now();
// The start time of the current fixed window slot
const slotStart = Math.floor(now / windowMs) * windowMs;
// The client's entry
let entry = store.get(key);
// If no entry exists yet, or we've moved into a new time slot,
// initialize/reset the counter for the new window
if (!entry || entry.windowStart !== slotStart) {
entry = {
count: 0,
windowStart: slotStart
};
store.set(key, entry);
}
// Increment the request counter
entry.count++;
// If the request count exceeds the allowed limit for this window,
// reject with HTTP 429 Too Many Requests
if (entry.count > limit) {
return res.status(429).json({ error: 'Too Many Requests' });
}
// Allow and forward the request to the next middleware
next();
};
}
Strategy 2: Sliding Window
The sliding window strategy consists in looking back over the last N seconds (or milliseconds) from right now and counting how many requests a client makes within that rolling window.
Once the client exceeds the defined maximum amount of requests in that rolling period, all further requests are blocked until enough time has passed for older requests to fall outside of the window.
For example, a sliding window of 10 requests per minute means that if a client makes 10 requests between 12:00:15
and 12:01:15
, any additional request at 12:01:10
will be blocked, but a new request at 12:01:16
will be allowed because the first request (from 12:00:15
) is no longer within the last 60 seconds.
This strategy is often used for public APIs.
Pros and Cons
✅ No boundary burst exploits.
✅ Matches the natural meaning of "10 per minute".
❌ Slightly more memory and compute (tracking timestamps).
Implementation example
Here's an implementation example of the sliding window strategy using a function that takes as parameters the maximum amount of requests a client can make (limit
) and the duration in milliseconds of the rolling window (windowMs
), and returns an Express middleware.
function slidingWindow(limit, windowMs) {
const store = new Map();
return (req, res, next) => {
const key = req.ip;
const now = Date.now();
// The client's recent request times
// Start with empty array if none
let times = store.get(key) || [];
// Ensure the map has an entry
store.set(key, times);
// Compute the oldest timestamp we still care about for the rolling window
const cutoff = now - windowMs;
// Keep only timestamps within the last `windowMs`
times = times.filter(time => time > cutoff);
// Save the pruned list back to the store
store.set(key, times);
// Reject the request if the count exceeds the limit
if (times.length >= limit) {
return res.status(429).json({ error: 'Too Many Requests' });
}
// Record the request's timestamp
times.push(now);
// Allow and forward the request to the next middleware
next();
};
}
Strategy 3: Token Bucket
The token bucket strategy consists in assigning each client a bucket that fills with tokens at a steady rate over time (e.g., 10 tokens per second).
Each request the client makes consumes one token from the bucket. If the bucket has tokens available, the request is allowed; if it's empty, the request is blocked until new tokens are added.
For example, a token bucket of 60 tokens refilled at a rate of 1 token per second means a client can make short bursts of up to 60 requests at once, but after that, they will only be able to send 1 new request per second as the bucket refills.
This strategy is can be used for all types of endpoints as it combines steady rates with short bursts.
Pros and Cons
✅ Best balance of fairness and flexibility.
✅ Supports controlled bursts.
❌ A bit more logic to implement and maintain.
Implementation example
Here's an implementation example of the token bucket strategy using a function that takes as parameters the number of tokens to refill (refill
), the refill rate in seconds (rate
), and the maximum amount of tokens a bucket can hold (burst
), and returns an Express middleware.
function tokenBucket(refill, rate, burst) {
const store = new Map();
return (req, res, next) => {
const key = req.ip;
const now = Date.now();
let bucket = store.get(key);
if (!bucket) {
// Create a new bucket filled at capacity (bust)
bucket = {
tokens: burst,
lastRefill: now
};
store.set(key, bucket);
}
// The refill interval in milliseconds
const intervalMs = rate * 1000;
// The number of intervals elapsed between now and the last refill
const intervalsElapsed = Math.floor((now - bucket.lastRefill) / intervalMs);
// If at least one full interval passed
// Either fill the bucket to capacity
// Or add `refill` number of tokens multiplied by the number of passed intervals
if (intervalsElapsed > 0) {
// Fill the bucket to capacity or to
bucket.tokens = Math.min(
burst,
bucket.tokens + intervalsElapsed * refill
);
bucket.lastRefill += intervalsElapsed * intervalMs;
}
// Reject the request if the bucket is empty
if (bucket.tokens < 1) {
return res.status(429).json({ error: 'Too Many Requests' });
}
// Decrement the amount of tokens by 1
bucket.tokens -= 1;
// Allow and forward the request to the next middleware
next();
};
}
Want to Learn Backend the Right Way?
If you enjoyed this article and want to go beyond snippets, check out the Learn Backend Mastery program — the complete zero-to-hired roadmap to becoming a job-ready Node.js backend developer, including:
- 136 premium lessons across CLI, JavaScript, Node.js, MySQL, Express, Git, and more.
- 25 full-scale projects + commented solutions
- Visual progress tracking.
- Exclusive student Discord channel
- Lifetime access and all future updates
Top comments (0)