Bill Tu

Posted on Apr 9

Not All Requests Are Equal: Adding Variable Cost to a Node.js Rate Limiter

#node #npm #ratelimiter

Rate limiters typically treat every request the same. One request, one token. But in the real world, a request that fetches a single user profile and a request that exports 10,000 records are not the same thing. Charging them the same rate limit token is like charging the same toll for a bicycle and a semi-truck.

We just shipped variable cost support in node-rate-limiter-pro v1.2.0. This post covers why it matters, how we implemented it across two algorithms and two storage backends, and the one subtle bug that almost shipped with it.

The Problem

Consider a batch API endpoint:

POST /api/users/export
Body: { "ids": ["u1", "u2", ..., "u500"] }

With a flat rate limiter set to 100 requests per minute, a client can export 50,000 users per minute by sending 100 requests of 500 IDs each. But a client making simple single-user lookups gets the same 100 requests per minute — 100 users total.

That's a 500x difference in actual resource consumption for the same rate limit.

Variable cost fixes this. Instead of consume('user:123') always deducting 1 token, you can now write:

// Each ID in the batch costs one token
const result = await limiter.consume('user:123', ids.length);

The batch export of 500 IDs now costs 500 tokens. The single lookup costs 1. The rate limiter finally reflects reality.

The API

The change is minimal — one optional parameter:

// Before (still works, backward compatible)
await limiter.consume('user:123');

// After: consume 5 tokens
await limiter.consume('user:123', 5);

For Express middleware, we added a costFn option:

app.use(limiter.middleware({
  costFn: (req) => req.body?.items?.length || 1,
}));

Implementation: Four Layers Deep

The cost parameter touches every layer of the stack: the public API, two algorithm implementations, two store backends, and two Redis Lua scripts. Here's how each one changed.

Token Bucket

The Token Bucket was the easy one. The algorithm already works with fractional tokens — we just replaced the hardcoded 1 with cost:

// Before
if (bucket.tokens >= 1) {
  bucket.tokens -= 1;
  // ...
}
const retryAfter = Math.ceil(
  ((1 - bucket.tokens) / this.maxTokens) * this.refillIntervalMs
);

// After
if (bucket.tokens >= cost) {
  bucket.tokens -= cost;
  // ...
}
const retryAfter = Math.ceil(
  ((cost - bucket.tokens) / this.maxTokens) * this.refillIntervalMs
);

The retryAfter calculation scales naturally. If you need 5 tokens and only have 2, you need to wait for 3 tokens to refill. The math is (cost - currentTokens) / refillRate.

Sliding Window — Where It Got Interesting

Our sliding window uses a weighted counter approach. Instead of storing every timestamp, it keeps two sub-window counters and calculates a weighted estimate:

estimatedCount = prevWindowCount × weight + currentWindowCount

where weight decays linearly from 1 to 0 as time progresses through the current window.

The original admission check was:

if (estimatedCount < this.maxRequests) {
  state.currCount++;
  // allowed
}

The naive variable-cost version would be:

if (estimatedCount + cost <= this.maxRequests) {
  state.currCount += cost;
  // allowed
}

This is where the bug hid. More on that in a moment. The correct implementation:

const used = Math.floor(estimatedCount);
const remaining = Math.max(0, this.maxRequests - used);

if (cost <= remaining) {
  state.currCount += cost;
  // allowed
}

We floor the estimated count to get integer remaining capacity, then check if the cost fits. The remaining field in the response now accurately reflects how many tokens are available — even when the request is denied:

// Denied response now shows actual remaining capacity
return {
  allowed: false,
  remaining,  // e.g., 2 tokens left, but you asked for 5
  // ...
};

This is a meaningful improvement over the old code, which always returned remaining: 0 on denial regardless of actual capacity.

Redis Lua Scripts

Both Redis Lua scripts needed the same treatment. The cost arrives as an additional ARGV parameter.

Token Bucket in Lua — straightforward substitution:

local cost = tonumber(ARGV[4])

if tokens >= cost then
  tokens = tokens - cost
  -- ...
  return {1, math.floor(tokens), 0}
end

local retryAfter = math.ceil(((cost - tokens) / maxTokens) * refillInterval)
return {0, math.floor(tokens), retryAfter}

Sliding Window in Lua — the ZSET approach needs to add cost members:

local cost = tonumber(ARGV[4])

if count + cost <= limit then
  for i = 1, cost do
    local seq = redis.call('INCR', key .. ':seq')
    redis.call('ZADD', key, now, now .. '-' .. seq)
  end
  return {1, limit - count - cost, 0}
end

return {0, limit - count, retryAfter}

The loop adds cost individual entries to the sorted set. Each entry gets a unique member via the atomic INCR counter (which we introduced in v1.1.0 to fix a collision bug — see our previous post). This keeps the ZSET accurate for future window calculations.

Express Middleware

The middleware layer adds a costFn option that mirrors the existing keyFn pattern:

middleware(options?: {
  keyFn?: (req: any) => string;
  costFn?: (req: any) => number;
  onLimited?: (req: any, res: any) => void;
}) {
  const costFn = options?.costFn;

  return async (req, res, next) => {
    const key = keyFn(req);
    const cost = costFn ? costFn(req) : 1;
    const result = await this.consume(key, cost);
    // ...
  };
}

The Floating Point Bug That Almost Shipped

Remember the sliding window admission check? Here's the bug we caught in testing.

Setup: limit = 3, window = 1000ms. Consume 3 tokens, wait 1100ms (just past the window boundary), then try to consume 1 more.

After 1100ms, the algorithm advances the sub-windows:

prevCount = 3, currCount = 0
elapsedInCurrent ≈ 100ms
weight = 1 - 100/1000 = 0.9
estimatedCount = 3 × 0.9 + 0 = 2.7

The old check estimatedCount < maxRequests → 2.7 < 3 → true ✅

The naive variable-cost check estimatedCount + cost <= maxRequests → 2.7 + 1 = 3.7 <= 3 → false ❌

Same scenario, different result. The request that should be allowed gets blocked.

The root cause: the weighted counter produces fractional values. The old code compared a float against an integer with <, which implicitly gave ~0.99 tokens of headroom. The new code with <= eliminated that headroom.

The fix: floor the estimate before comparing, converting the float back to integer capacity:

const used = Math.floor(estimatedCount);     // floor(2.7) = 2
const remaining = Math.max(0, limit - used); // 3 - 2 = 1
if (cost <= remaining)                       // 1 <= 1 → true ✅

This preserves the original behavior for cost = 1 while correctly handling higher costs. If estimatedCount were 2.3 and cost were 2, we'd get remaining = 1, and 2 <= 1 → false — correctly blocking a request that would exceed the limit.

Real-World Usage Patterns

Here are some patterns we've seen (or expect to see) with variable cost:

Batch APIs:

app.post('/api/batch', limiter.middleware({
  costFn: (req) => req.body?.operations?.length || 1,
}));

File uploads by size:

app.post('/upload', limiter.middleware({
  costFn: (req) => Math.ceil(Number(req.headers['content-length'] || 1) / 1_000_000),
}));

Tiered pricing (premium users get cheaper requests):

app.use(limiter.middleware({
  costFn: (req) => req.user?.plan === 'premium' ? 1 : 3,
}));

GraphQL complexity:

app.use('/graphql', limiter.middleware({
  costFn: (req) => calculateQueryComplexity(req.body.query),
}));

Design Decisions

A few choices we made along the way:

Why cost as a parameter, not a constructor option? Because cost varies per request, not per limiter. A single limiter instance might handle requests with different costs. Making it a per-call parameter is the only design that works.

Why default to 1? Backward compatibility. Every existing consume('key') call continues to work without changes. The cost parameter is purely additive.

Why floor the sliding window estimate? The weighted counter is an approximation. Flooring converts it to a conservative integer count, which is the right behavior for admission control — when in doubt, allow rather than block. This matches the original behavior and avoids surprising users with stricter-than-expected limits.

Why not validate cost > 0? We considered it, but decided against adding runtime validation on the hot path. If you pass cost = 0, you get a free peek at the current state (which is actually useful). If you pass a negative number, you're doing something wrong, but we're not going to add a branch to every consume() call to catch it.

What's Next

Variable cost was the most requested feature in our issue tracker. With it shipped, we're looking at:

get(key) — peek at rate limit state without consuming (#6)
Fixed Window Counter algorithm (#7)
IETF standard rate limit headers (#8)

The full implementation is in v1.2.0. MIT licensed, zero dependencies for in-memory mode, optional Redis for distributed deployments.

DEV Community