Debjit Dey

Posted on Apr 28

Adaptive Rate Limiting with Redis and Lua

#backend #distributedsystems #systemdesign #tutorial

Making Rate Limiting Correct Under Concurrency

Most rate limiting tutorials stop at the single-instance case.
That’s fine for learning, but it breaks quickly in production.

Once you have multiple instances and real traffic patterns, the problem changes.
It’s no longer just about picking an algorithm — it’s about correctness under concurrency.

This article walks through what actually goes wrong and how to fix it.

The In-Memory Trap

The first implementation most people write looks like this:

keep a counter in memory
increment on each request
reject when the limit is reached

This works perfectly in a single instance.

Now deploy two instances.

Each instance has its own counter. A client can exceed your intended limit just by hitting different instances.

At that point, you don’t have a rate limiter anymore.
You have a suggestion.

Redis Fixes Distribution, Not Concurrency

The next step is moving state to Redis.

Now all instances share the same counters. Good.

A typical implementation looks like this:

Read current count from Redis
Check against limit
Increment and write back

This seems correct, but it isn’t.

These are separate operations. Under concurrent load:

two requests read the same value
both pass the check
both increment

Now your limit is no longer strict. It’s approximate.

The Real Problem: Atomicity

The issue isn’t Redis.

It’s that the decision is split across multiple steps.

What you need is:

a single, atomic operation that reads state, applies logic, and updates state

The Fix: Lua Scripts in Redis

Redis supports Lua scripts that execute atomically.

No other command runs between the start and end of the script.

Instead of multiple round trips:

read state
apply limiter logic
update state
return decision

You do everything inside one script.

Example (simplified):

local current = redis.call("GET", KEYS[1]) or 0
if tonumber(current) >= tonumber(ARGV[1]) then
  return {0, current}
end

current = redis.call("INCR", KEYS[1])
redis.call("EXPIRE", KEYS[1], ARGV[2])

return {1, current}

This ensures:

no race conditions
consistent decisions across instances
predictable behavior under load

Where Algorithms Fit In

At this point, you can plug in different strategies:

Token Bucket → allows bursts, smooths over time
Sliding Window → more accurate but heavier
Leaky Bucket → enforces steady flow

But here’s the key point:

The algorithm matters less than where the decision happens.

If your logic isn’t atomic, the algorithm won’t save you.

Static Limits Miss Real Traffic Behavior

Even with correct enforcement, static limits are too rigid.

Real traffic looks like:

legitimate bursts
scrapers probing endpoints
repeated identical requests
denial loops

A fixed limit treats all of these the same.

Adding a Behavior Layer

A simple improvement is to track short-term behavior:

request volume over a short window (burst detection)
repeated request fingerprints
number of unique routes hit (scan detection)
repeated denials

This produces a basic risk score.

That score maps to tiers:

normal
elevated
suspicious
blocked

The important part is separation:

Limiter → enforces limits
Policy → decides how strict to be

This keeps the system easier to reason about and tune.

Tradeoffs

This approach is not free.

Lua scripts add complexity
debugging moves closer to Redis
Redis becomes a critical dependency

But for systems that need consistency under concurrency, the tradeoff is worth it.

Key Takeaway

The biggest lesson is not about token buckets or sliding windows.

It’s this:

Correctness in rate limiting comes from atomic decision-making.

Once you ensure:

a single source of truth
atomic execution
consistent state across instances

the rest becomes much easier.

Closing

I built this approach into a small system to explore the problem end-to-end.

If you’re interested in seeing a full implementation (TypeScript + Redis + Lua), you can check it out here:

👉 https://github.com/debjit450/arce

If you’ve dealt with this problem in production, I’d be interested to hear how you approached it.

DEV Community