Timevolt

Posted on Jul 2

The Rate Limiter Awakens: A CAP Theorem Tale

#systemdesign #architecture #backend #programming

The Quest Begins (The "Why")

Ever tried to protect an API from a sudden traffic spike and ended up throttling legit users while letting the bots slip through? I’ve been there. Last month I was on-call for a micro‑service that sat behind a public endpoint. The traffic graph looked like a rollercoaster—steady most of the day, then a massive burst that looked like a Godzilla stomping through Tokyo. My first instinct was to slap a simple in‑memory counter on each instance: “if requestCount > limit, reject”.

It worked… until we added a second node behind a load balancer. Suddenly the limit was per node, not global. A burst of 10k requests could hit 5k on each node and sail right through, while a lonely user on a third node got 429’d after just a few calls. The system felt like it was playing whack‑a‑mole with itself.

That’s when the CAP theorem stopped being a dusty textbook diagram and became the real‑world monster I needed to slay. If I wanted a single logical rate limit across all nodes, I had to decide what I was willing to sacrifice: consistency, availability, or partition tolerance. Spoiler: you can’t have all three, but you can pick the two that matter most for your use case.

The Revelation (The Insight)

The CAP theorem says a distributed data store can only guarantee two of the following three properties at any moment:

Consistency – every read sees the most recent write.
Availability – every request gets a response (though it might be stale).
Partition Tolerance – the system keeps working despite network splits.

Think of it like choosing a superhero team: you can have Iron Man (consistency) and Captain America (availability) but if the city splits into two halves (a network partition), you’ll lose one of them unless you bring in Thor (partition tolerance) and let the other slide.

For a rate limiter, the “data” we’re storing is a counter of requests per key (user ID, IP, API token). The critical insight is:

If you need a hard, global limit (no user can exceed X requests per second), you must favor Consistency over Availability during a partition.

If you’d rather stay up and maybe let a few extra requests slip, you favor Availability and accept eventual consistency.

I chose the former because our business rule was strict: “no more than 100 req/s per API key, period.” That meant we needed a CP system—when the network hiccups, we’d rather reject requests (lose availability) than risk breaking the limit.

Here’s the mental model I keep on my desk (feel free to copy it):

        +-------------------+
        |   Partition Tolerance (P)  |
        +-------------------+
               /             \
    Consistency (C)         Availability (A)
   (strong limit)          (best‑effort limit)

When the network is healthy, you can have all three (the ideal case). When a partition appears, you slide to the edge that matches your priority.

Wielding the Power (Code & Examples)

The Struggle: Naïve Per‑Node Limiter

// simple in-memory limiter (per instance)
type limiter struct {
    mu      sync.Mutex
    count   int
    last    time.Time
    limit   int
    window  time.Duration
}

func (l *limiter) Allow() bool {
    l.mu.Lock()
    defer l.mu.Unlock()
    now := time.Now()
    if now.Sub(l.last) > l.window {
        l.count = 0
        l.last = now
    }
    if l.count >= l.limit {
        return false // reject
    }
    l.count++
    return true
}

Trap #1: Assuming the counter is shared. In a cluster of three nodes behind a round‑robin LB, each node tracks its own count. A burst of 300 requests with limit=100 per second gets split 100/100/100 → all pass, even though the global limit is 100.

Trap #2: Using Redis INCR without a TTL and expecting atomicity across partitions. If the Redis node you talk to becomes unreachable, your limiter either blocks forever (killing availability) or you fall back to a local counter (breaking consistency).

The Victory: CP‑Style Distributed Limiter with Redis

We’ll use a single Redis instance (or a strongly consistent Redis cluster with WAIT/REPLICA ACK) as the source of truth. The algorithm is a fixed‑window counter stored under a key like rl:{APIKey}:{timestamp}.

// redis-backed limiter (CP choice)
type redisLimiter struct {
    client  *redis.Client
    limit   int
    window  time.Duration
    keyFunc func(string) string // e.g. fmt.Sprintf("rl:%s:%d", apiKey, epoch)
}

func (r *redisLimiter) Allow(apiKey string) bool {
    now := time.Now()
    key := r.keyFunc(apiKey)
    // Lua script ensures atomic increment + expire
    script := redis.NewScript(`
        local current = redis.call('INCR', KEYS[1])
        if current == 1 then
            redis.call('EXPIRE', KEYS[1], ARGV[1])
        end
        return current
    `)
    // run script, get current count
    cnt, err := script.Run(r.client, []string{key}, []string{strconv.Itoa(int(r.window.Seconds()))}).Result()
    if err != nil {
        // On Redis failure we choose Consistency: reject rather than risk over‑limit
        log.Printf("Redis error: %v – failing closed", err)
        return false
    }
    return cnt.(int64) <= int64(r.limit)
}

Why this is CP:

If Redis is unreachable, the script errors and we fail closed (return false). Availability takes a hit, but we never let a client exceed the limit.
When the network is healthy, the Lua script guarantees that the increment and expiry happen atomically, giving us a strongly consistent view of the counter across all clients.

If you flip the failure mode to return true (fail open), you’d get an AP limiter: the service stays up, but during a partition you might accidentally allow more requests than the limit—acceptable for some use‑cases (e.g., non‑critical analytics) but not for our strict quota.

Quick Comparison Table

Design	Consistency	Availability	Partition Behaviour
Per‑node in‑memory	❌ (local)	✅	Each node independent → global limit broken
Redis CP (fail‑closed)	✅	❌ (on ↓)	Rejects when Redis unreachable
Redis AP (fail‑open)	❌ (eventual)	✅	May over‑limit during partition

Why This New Power Matters

Now that I’ve got a CP rate limiter in place, our API behaves like a fortress with a drawbridge: when the network is sound, the bridge stays down and legit traffic flows; when a partition appears, the bridge lifts and we keep the enemy (excess traffic) out, even if it means turning away a few good folks temporarily.

The biggest win? Predictability. Our SLA now guarantees “no more than 100 req/s per key” every second, not just on average. That makes downstream services (billing, analytics, downstream APIs) far simpler because they can trust the inbound rate.

On the flip side, if you’re building something like a “like” counter where a few extra increments during a network glitch won’t break the bank, you’d pick the AP variant and enjoy higher uptime. The CAP theorem isn’t a judgment—it’s a lens that helps you pick the right tool for the job.

So next time you’re staring at a traffic spike, ask yourself: Do I need a hard guardrail or a friendly suggestion? Your answer will point you straight to the sweet spot on the CAP triangle.

Your Turn

Grab a language of choice, spin up a Redis (or even a mock in‑memory store for testing), and try implementing both the fail‑closed and fail‑open versions. Throw in a simulated network partition (e.g., block Redis port with iptables or use a toxicproxy) and watch how each behaves.

Which version feels more like your product’s personality? Drop a comment or a tweet with your findings—I’d love to hear what you discovered!

Happy rate‑limiting! 🚀

DEV Community