Timevolt

Posted on Jun 18

Designing a Rate Limiter for a Bit.ly Clone – My Jedi‑Level Token Bucket Adventure

#systemdesign #architecture #backend #programming

The Quest Begins (The “Why”)

Picture this: I’m hunched over my laptop at 2 a.m., sipping cold brew, trying to get a tiny URL‑shortening service to survive a sudden traffic spike from a viral TikTok clip. The service works fine when a handful of devs ping it, but as soon as a few thousand requests hit per second, my naïve “count‑requests‑per‑IP‑and‑reset‑every‑minute” limiter starts throwing 429s like confetti at a New Year’s party—except the confetti is actually angry users.

I felt like Neo in The Matrix when he first sees the green code rain—except my code was a tangled mess of if‑else statements and I was about to get Agent Smith’d by a DDoS‑sized burst of traffic. The dragon I needed to slay wasn’t just “too many requests”; it was “bursty traffic that looks fine on average but destroys my service in spikes.”

After a couple of frustrating hours (and a meme‑worthy amount of coffee), I realized the problem wasn’t the amount of traffic—I needed a smarter way to smooth it out. Enter the token bucket algorithm, the Jedi mind trick of rate limiting.

The Revelation (The Insight)

The token bucket is simple, yet profoundly powerful:

Imagine a bucket that holds a fixed number of tokens (say, 100).
Tokens drizzle into the bucket at a steady rate (e.g., 10 tokens per second).
Each incoming request grabs a token; if the bucket is empty, the request gets throttled.

Why does this beat the old fixed‑window counter?

Approach	How it Handles Burst	Memory Cost	Implementation Complexity
Fixed‑window (reset per minute)	Allows up to N requests instantly at the start of each window → huge burst possible	O(1) per client (just a counter)	Very easy, but naive
Token bucket	Allows a burst up to bucket size, then smooths to the refill rate → natural throttling	O(1) per client (counter + timestamp)	Slightly more code, but still trivial

The key insight? Buckets absorb bursts while still enforcing an average rate. It’s like having a lightsaber that can deflect a barrage of blaster bolts (the burst) but only lets a steady stream through (the average rate).

If you’ve ever watched Mad Max: Fury Road and seen the war rig plow through a sandstorm, you’ll get the analogy: the bucket is the rig’s fuel reserve—you can surge ahead when you have fuel, but you can’t exceed the rate at which you can refill it.

Wielding the Power (Code & Examples)

Let’s see the theory turn into code. I’ll use Node.js with Redis (because it’s fast, atomic, and shared across instances—perfect for a horizontally‑scaled shortener).

The Struggle (What NOT to Do)

// ❌ Naive fixed‑window limiter – the trap!
const LIMIT = 100;   // requests per minute
const WINDOW_MS = 60_000;

async function allowRequest(ip) {
  const key = `rate:${ip}`;
  const count = await redis.incr(key);
  if (count === 1) {
    await redis.expire(key, Math.ceil(WINDOW_MS / 1000)); // set TTL
  }
  return count <= LIMIT; // true = allow, false = block
}

Why this fails:

At second 0 of a minute, 100 requests slip through instantly.
At second 30, another 100 can slip through because the counter reset hasn’t happened yet.
Result: a client can burst 200 requests in 30 seconds—double the intended rate.

I spent three hours debugging why my service kept falling over during a load test, only to realize the counter was giving me a false sense of security. When I finally saw the pattern, I felt like Luke discovering the Force—“I feel it… the rate limiter is strong with this one.”

The Victory (Token Bucket Implementation)

// ✅ Token bucket limiter – the Jedi way
const BUCKET_SIZE = 100;   // max burst
const REFILL_RATE = 10;    // tokens per second

async function allowRequest(ip) {
  const key = `bucket:${ip}`;          // stores {tokens, lastRefill}
  const now = Date.now() / 1000;       // seconds as float

  // Lua script ensures atomic read‑modify‑write
  const lua = `
    local tokens = tonumber(redis.call('HGET', KEYS[1], 'tokens'))
    local last   = tonumber(redis.call('HGET', KEYS[1], 'last'))

    if tokens == nil then
      tokens = tonumber(ARGV[2])   // start full
      last   = tonumber(ARGV[1])
    end

    local refill = (tonumber(ARGV[1]) - last) * tonumber(ARGV[3])
    tokens = math.min(tonumber(ARGV[2]), tokens + refill)

    local allowed = 0
    if tokens >= 1 then
      tokens = tokens - 1
      allowed = 1
    end

    redis.call('HMSET', KEYS[1], 'tokens', tokens, 'last', tonumber(ARGV[1]))
    redis.call('EXPIRE', KEYS[1], 3600) // optional TTL to clean idle keys
    return allowed
  `;

  const result = await redis.eval(lua, 1, key, now, BUCKET_SIZE, REFILL_RATE);
  return result === 1; // true = allow
}

What’s happening?

We fetch (or initialize) the bucket’s token count and the timestamp of the last refill.
We compute how many tokens should have been added since last: (now - last) * REFILL_RATE.
We cap the token count at BUCKET_SIZE (the burst limit).
If there’s at least one token, we consume it and allow the request; otherwise we reject.
We store the updated token count and timestamp back atomically via a Lua script—critical because multiple instances could race on the same key.

Trade‑offs I considered:

Redis vs. In‑memory maps: Going pure in‑memory would be faster per node, but would break consistency when you scale out. Redis gives us a single source of truth with sub‑millisecond latency—worth the extra network hop.
Lua script overhead: Some folks shy away from Lua because it feels “heavy.” In reality, the script runs in a few microseconds and guarantees atomicity without locking. The peace of mind is priceless.
Bucket size choice: Too small and you throttle legitimate bursts (think a user pasting 50 links at once). Too large and you lose smoothing power. I tuned ours by watching real traffic patterns—starting with BUCKET_SIZE = 2 * REFILL_RATE gave a nice balance.

Common Pitfalls (The Traps)

Trap	Symptom	Fix
Forgetting to update `last` timestamp	Tokens never refill → permanent block after first burst	Always write back `last = now` after refill calculation
Using separate `GET`/`SET` calls	Race condition → token count can drift above bucket size	Wrap logic in a Lua script (or use Redis transactions)
Setting a TTL that’s too short	Idle users get their bucket wiped prematurely, causing unfair throttling on return	Use a relatively long TTL (e.g., 1 hour) or refresh on each request

When I first missed the last update, my limiter acted like a lightsaber that turned off after the first swing—every request after the initial burst got 429’d. Fixing that felt like finally pulling the sword from the stone: click, and the power flowed freely again.

Why This New Power Matters

Now that I’ve got a token bucket guarding my shortener, the service laughs at traffic spikes that would have melted a fixed‑window limiter. I can:

Handle viral bursts without dropping legitimate requests.
Predict cost – the average request rate is bounded by REFILL_RATE, making capacity planning straightforward.
Scale horizontally – each node talks to the same Redis, so adding more API servers doesn’t require sharding the limiter logic.

In essence, I’ve turned my URL shortener from a fragile glass vase into a resilient, battle‑ready droid (think R2‑D2 dodging laser fire in Star Wars). The insight—absorb bursts with a bucket, then leak at a steady rate—isn’t just academic; it’s the difference between a service that survives the internet’s chaotic weather and one that cries for help at the first gust.

Your Turn – Embark on Your Own Quest

Grab a language of choice, spin up a Redis instance (or even a simple in‑memory map if you’re just experimenting), and implement a token bucket. Try to break it with a burst generator (hey, hey or wrk works). Then, tweak the bucket size and refill rate and watch how the behavior changes.

Challenge:

Build a tiny URL‑shortener endpoint (/shorten) that calls your limiter before hitting the database.
Add a dashboard that shows the current token count for a few test IPs (just for fun—don’t expose this in prod!).
Share your results: what bucket size gave you the smoothest traffic flow? Did you hit any surprising edge cases?

Drop a link to your repo in the comments, and let’s compare notes. May your buckets stay full and your rate limits stay just right—happy coding! 🚀

DEV Community