SEN LLC

Posted on Apr 15

The Four Rate-Limiting Algorithms You Should Know (I Built a PHP Service That Runs All Five)

#php #slim #ratelimiting #tutorial

The Four Rate-Limiting Algorithms You Should Know (I Built a PHP Service That Runs All Five)

A Slim 4 service that exposes fixed window, sliding window log, sliding window counter, token bucket, and leaky bucket as five separate endpoints, with the same config and the same headers, plus a /compare endpoint that simulates N requests through all of them at once and returns a side-by-side breakdown. Every limiter takes an injected clock, so the tests can demonstrate the fixed-window edge-burst flaw in milliseconds instead of sleeping 10 seconds.

Almost every backend eventually needs rate limiting. In practice, most teams adopt whichever algorithm they find first — usually a crude fixed window — and learn about its gaps the hard way, after a real client burst or a real abuse report. The gap between "I know the names of these algorithms" and "I know what they do to real traffic" is wider than it looks.

🔗 GitHub: https://github.com/sen-ltd/rate-limit-demo

I built a small Slim 4 service that runs all five of the textbook algorithms against the same configuration — 10 requests per 10 seconds, or bucket capacity 10 with a 1-token-per-second refill — and lets you watch them make different decisions by hitting different endpoints with the same curl loop. There's also a /compare endpoint that fires N simulated requests at all five at once and returns a structured comparison. The point wasn't to ship yet another rate limiter. It was to make each algorithm's behavior visible next to its source code.

The four (or five) you keep hearing about

The folklore list is fixed window, sliding window, token bucket, leaky bucket. That's really four families but five implementations, because "sliding window" splits into two subtly different algorithms — the exact log version and the approximate counter version — and the distinction matters a lot when you think about storage. So this service exposes five endpoints:

Endpoint	Algorithm	Memory per key
`/fixed-window`	Fixed window counter	O(1)
`/sliding-window-log`	Sliding log of timestamps	O(limit)
`/sliding-window-counter`	Two-bucket approximation	O(1)
`/token-bucket`	Token bucket (lazy refill)	O(1)
`/leaky-bucket`	Leaky bucket as queue	O(1)

All five share one interface:

interface LimiterInterface
{
    public function name(): string;
    public function check(string $key): Decision;
    public function peek(string $key): Decision;
    public function reset(?string $key = null): void;
    public function storedKeys(): int;
    public function exportState(): array;
    public function importState(array $state): void;
}

check() is the mutating operation: it decides whether the request from $key is allowed, records the attempt, and returns a Decision value object that the HTTP layer turns into both a JSON body and a set of X-RateLimit-* headers. peek() is the read-only counterpart for the /compare output. exportState() / importState() exist because PHP's built-in dev server reloads the script on every request, so I persist state through a single JSON file between requests — more on that later, it's not the interesting part.

The interesting part is the body of each check().

Design: inject the clock, or your tests will hate you

Rate limiting is fundamentally about time: "how many requests in the last N seconds." If your limiter calls microtime(true) directly, your tests can only observe its behavior by actually sleeping, and the test suite gets both slow and flaky. So every limiter in this service takes a Clock interface:

interface Clock
{
    public function now(): float;
}

And the tests inject a FakeClock that starts at a known epoch and only advances when advance() is called. That's how the fixed-window edge-burst test below can demonstrate a real flaw in milliseconds instead of running for 10 seconds.

Fixed window — the one everyone starts with

Wall-clock time is partitioned into windows of width $windowSeconds, and each window has its own counter per key. On each request you figure out which window you're in, and if the counter is below the limit, increment and allow.

public function check(string $key): Decision
{
    $now    = $this->clock->now();
    $bucket = (int) floor($now / $this->windowSeconds);

    $entry = $this->state[$key] ?? ['bucket' => $bucket, 'count' => 0];
    if ($entry['bucket'] !== $bucket) {
        $entry = ['bucket' => $bucket, 'count' => 0];
    }

    $resetAt = (float) (($bucket + 1) * $this->windowSeconds);

    if ($entry['count'] >= $this->limit) {
        $this->state[$key] = $entry;
        return new Decision(false, $this->name(), $this->limit, 0, $resetAt, $entry['count']);
    }

    $entry['count']++;
    $this->state[$key] = $entry;
    return new Decision(true, $this->name(), $this->limit,
                        $this->limit - $entry['count'], $resetAt, $entry['count']);
}

One integer per key, one comparison per request. It is easy to understand and cheap to run, and it has a very famous flaw: a client aligned to a window boundary can serve 2 × limit requests inside a span of a few hundred milliseconds. The test for this flaw is the one that convinced me the injectable clock was worth it:

public function testEdgeBurstFlaw20RequestsIn10Seconds(): void
{
    $clock = new FakeClock(1_000_009.5); // 0.5s before boundary
    $fw    = new FixedWindow($clock, 10, 10);

    $allowed = 0;
    for ($i = 0; $i < 10; $i++) {
        if ($fw->check('k')->allowed) $allowed++;
    }
    // Cross the boundary.
    $clock->advance(0.6);
    for ($i = 0; $i < 10; $i++) {
        if ($fw->check('k')->allowed) $allowed++;
    }

    $this->assertSame(
        20,
        $allowed,
        'fixed window allows 20 requests inside a 10s span at the boundary — this is the flaw',
    );
}

That is a real 10 req / 10 s limiter serving 20 requests inside 1.1 seconds of wall clock. Nothing is buggy about this code — it is doing exactly what fixed window says — it's just that fixed window is the wrong algorithm for this concern.

Sliding window log — exact, expensive

The sliding log fixes the edge burst by giving up the window boundary entirely. For each key, keep the timestamps of the requests that have happened in the last $windowSeconds. Drop the old ones, count what remains.

public function check(string $key): Decision
{
    $now    = $this->clock->now();
    $cutoff = $now - $this->windowSeconds;
    $log    = $this->state[$key] ?? [];

    // Drop expired timestamps (the log is already time-ordered).
    $drop = 0;
    foreach ($log as $t) {
        if ($t > $cutoff) break;
        $drop++;
    }
    if ($drop > 0) $log = array_slice($log, $drop);

    if (count($log) >= $this->limit) {
        $this->state[$key] = $log;
        return new Decision(false, $this->name(), $this->limit, 0,
                            $log[0] + $this->windowSeconds, count($log));
    }

    $log[] = $now;
    $this->state[$key] = $log;
    return new Decision(true, $this->name(), $this->limit,
                        $this->limit - count($log),
                        $log[0] + $this->windowSeconds, count($log));
}

This is mathematically "the right answer" — at any wall-clock instant, no more than limit requests have been served in the preceding windowSeconds. The price is memory: one float per served request per key. For limit=10 that's trivial; for limit=10,000 spread over millions of keys on a busy API gateway, it is not.

Sliding window counter — the Cloudflare approximation

This is the "best of both worlds" algorithm commonly attributed to Cloudflare's engineering blog: the O(1) storage of fixed window, without the edge-burst flaw, at the cost of giving up exactness. You keep two counters per key — the current bucket and the immediately previous bucket — and estimate the sliding window using a weighted average:

$bucket      = (int) floor($now / $this->windowSeconds);
$bucketStart = (float) ($bucket * $this->windowSeconds);
$elapsed     = $now - $bucketStart;
$weight      = 1.0 - ($elapsed / $this->windowSeconds);

// The core approximation formula.
$estimate = (int) floor($entry['prev'] * $weight + $entry['count']);

if ($estimate >= $this->limit) {
    // deny
}

If you're 1% into the current bucket, weight = 0.99, so the previous bucket's count is almost fully accounted for, and the estimate is close to "what the sliding window would actually say." If you're 99% into the current bucket, weight = 0.01, and the previous bucket barely contributes. The approximation assumes the previous bucket's requests were uniformly distributed across that window, which isn't exactly true, but in practice — for the kinds of traffic you're actually trying to rate limit — the estimate is within a few percent of the exact answer and uses two integers instead of a list.

The edge-burst test that passes 20 requests against fixed window passes only 10 against sliding window counter, because the weighted previous bucket pushes the estimate above the limit immediately.

Token bucket — the one you want for "normal" APIs

A bucket of $capacity tokens. Tokens are added at $refillRate per second up to the capacity. Every request costs one token. A fresh client with a full bucket can burst up to capacity requests instantly; after that it's throttled to the refill rate.

"Lazy refill" is the idiomatic trick: you don't run a background timer, you compute how many tokens should have been added the next time you look at the bucket.

public function check(string $key): Decision
{
    $now   = $this->clock->now();
    $entry = $this->state[$key] ?? ['tokens' => (float) $this->capacity, 'last' => $now];

    $elapsed = $now - $entry['last'];
    if ($elapsed > 0) {
        $entry['tokens'] = min(
            (float) $this->capacity,
            $entry['tokens'] + $elapsed * $this->refillRate,
        );
    }
    $entry['last'] = $now;

    if ($entry['tokens'] >= 1.0) {
        $entry['tokens'] -= 1.0;
        // allowed
    }
    // denied
}

Tokens are a float, not an int, for a real reason: between two requests spaced 0.4 seconds apart at a 1 token/sec refill rate, the bucket gains exactly 0.4 tokens. Rounding that to zero would make the sustained rate strictly less than the configured rate, because fractional credit would keep leaking. The min($capacity, ...) clamp is how "extra tokens are discarded" is actually enforced — that's where the rate limit lives.

Token bucket is the most common choice for "normal" API throttling because it matches how real humans actually use APIs: you're idle for a while, then you make a burst of requests, then you're idle again. A token bucket with capacity = 10, refill = 1/sec accepts that pattern gracefully. It does not try to smooth your traffic; it only bounds the long-run average.

Leaky bucket — the one people confuse with token bucket

Same two-word name, very different semantics. Conceptually, a leaky bucket is a fixed-size FIFO queue that drains at a constant rate. A request is allowed if it fits in the queue after draining has been accounted for.

public function check(string $key): Decision
{
    $now   = $this->clock->now();
    $entry = $this->state[$key] ?? ['size' => 0.0, 'last' => $now];

    $elapsed = $now - $entry['last'];
    if ($elapsed > 0) {
        $entry['size'] = max(0.0, $entry['size'] - $elapsed * $this->leakRate);
    }
    $entry['last'] = $now;

    if ($entry['size'] + 1.0 > (float) $this->capacity) {
        // queue full → deny
    }
    $entry['size'] += 1.0;
    // allow
}

The difference from token bucket, in one sentence: token bucket is a credit card, leaky bucket is a water pipe. Idle time accumulates credit in a token bucket (you can burst later); idle time drains the leaky bucket (it has no way to turn idleness into future headroom). If you hand a fresh, long-idle client to a token bucket configured at capacity 10, it can instantly fire 10 requests. Hand that same idle client to a leaky bucket configured at capacity 10 and it can instantly fire at most 10 requests — which sounds the same, but the next 10 have to arrive at the drain rate, not all at once.

The point where this bites people in production: a leaky bucket set up as a traffic shaper returns 429 not because a client is abusing it, but because the client's burst pattern doesn't match the shaper's drain rate even when the average is well within the limit. If you want a service that tolerates bursty clients, token bucket is the one you want.

The /compare endpoint is the actual product

Reading five implementations isn't the same as watching them behave differently on the same input, so the service ships a /compare?n=20&delay=0.1 endpoint that runs a simulation and returns a structured comparison:

{
  "input": { "key": "0.0.0.0", "n": 15, "delay": 0.1 },
  "results": {
    "fixed_window":           { "allowed": 10, "denied": 5, "sequence": [...] },
    "sliding_window_log":     { "allowed": 10, "denied": 5, "sequence": [...] },
    "sliding_window_counter": { "allowed": 10, "denied": 5, "sequence": [...] },
    "token_bucket":           { "allowed": 11, "denied": 4, "sequence": [...] },
    "leaky_bucket":           { "allowed": 11, "denied": 4, "sequence": [...] }
  }
}

Internally, /compare runs each limiter against a fresh FakeClock that starts at the same instant and advances by exactly delay seconds between requests. No real sleeping. Each limiter sees the same input sequence, which is the only way the comparison is fair. (If you sleep for real, slow PHP processes make fast limiters look different from each other for reasons that have nothing to do with the algorithm.) The HTML page at / has a tiny JavaScript front end that renders the sequence as a colored row of rectangles so you can see at a glance which requests each algorithm allowed and denied.

The storage detour, in one paragraph

PHP's built-in -S server reloads public/index.php on every request, so nothing in memory survives across calls. I punt on that with a single JSON file under /tmp: hydrate at app boot, flush at end of request. This is emphatically not how you'd deploy it for real — in production every one of these algorithms has a well-known Redis Lua implementation (INCR for fixed window, ZADD + ZREMRANGEBYSCORE for sliding log, a short script for token/leaky bucket). The math is identical; only the "where does the state live" answer changes. The demo is about the algorithms, not the storage layer.

Trade-offs that bit me

A few things worth knowing before you pick one:

In-memory only. Single process, single host. The moment you have two app servers you have two rate limiters and the effective limit is 2× what you configured. Redis fixes this.
Per-IP only. I key on the remote address. That's the wrong key for most real APIs — you usually want per-user or per-API-key — but it's trivial to swap in.
No per-endpoint config. All five limiters are hard-wired at 10 / 10 s. A real service would pull config per route.
Leaky bucket surprises people. Specifically, its lack of burst credit. If you put a leaky bucket in front of a SPA that fires 5 requests in parallel on page load, you'll get 429s for some of them under load patterns that a token bucket would have allowed without flinching.
Sliding window counter is an approximation. If your limits are legally binding (billing, compliance), use the sliding log. The extra memory cost is usually cheaper than explaining to auditors why the counter sometimes says 11.

Try it in 30 seconds

git clone https://github.com/sen-ltd/rate-limit-demo
cd rate-limit-demo
docker build -t rate-limit-demo .
docker run --rm -p 8000:8000 rate-limit-demo

# In another shell:
for i in $(seq 1 12); do
  curl -sS -o /dev/null -w "%{http_code} " http://localhost:8000/fixed-window
done; echo
# → 200 200 200 200 200 200 200 200 200 200 429 429

curl -sS 'http://localhost:8000/compare?n=15&delay=0.1' | jq '.results | keys'

51 PHPUnit tests cover each limiter's happy path, the fixed-window edge-burst flaw, the sliding-counter weight decay, the token bucket's fractional refill, the leaky bucket's lack of burst credit, the simulator, and the full HTTP surface. Everything runs on the fake clock, so the whole suite finishes in 14 milliseconds.

If you remember one thing from all of this, let it be: pick your algorithm based on whether your clients are allowed to burst. Fixed window lets them. Sliding log and counter don't. Token bucket lets them, up to the bucket size. Leaky bucket doesn't let them at all. Everything else is an implementation detail.

GitHub: https://github.com/sen-ltd/rate-limit-demo