Microservices vs Monolith: The Rate Limiter Strikes Back

#systemdesign #architecture #backend #programming

The Quest Begins (The "Why")

I still remember the night our API started returning 429s like confetti at a parade. Users were complaining, the support ticket queue was exploding, and I was staring at a monolithic codebase that had grown a rate‑limiter the way a houseplant grows toward the window—slowly, unevenly, and with a lot of dead leaves. Every time we tried to tweak the limit for a new feature, we had to redeploy the whole service, run a full test suite, and pray nothing broke in the billing module. It felt like trying to change the tires on a moving train while blindfolded.

That’s when the question hit me: Is there a better way to isolate this cross‑cutting concern? Could we pull the rate limiter out of the monolith and let it live its own life? Or would we just be trading one set of headaches for another? I decided to treat the problem like a mini‑adventure—gather gear, map the terrain, and see which path leads to the treasure.

The Revelation (The Insight)

The “aha!” moment came when I drew two simple pictures on a whiteboard.

Monolith Rate Limiter
+-------------------+
|   API Service     |
|  (handles auth,   |
|   business logic, |
|   rate limiting)  |
+-------------------+
        |
        v
   Shared In‑Memory Counter
        |
        v
   Redis (optional fallback)

In the monolith, the limiter lives inside the same process as everything else. That means:

Latency: Every request pays the cost of a lock or atomic increment, even if the endpoint doesn’t need throttling.
Scalability: To handle more traffic you have to scale the entire API, wasting CPU on parts that don’t care about limits.
Deploy‑risk: Changing the limit algorithm requires a full redeploy; a bug can take down auth, payments, and the limiter all at once.

Now look at the microservice alternative:

Microservice Rate Limiter
+-------------------+      +-------------------+
|   API Service     |      |   Limiter Service |
| (auth, biz logic) |<---->| (token bucket,    |
+-------------------+      |  config via HTTP) |
        |                  +-------------------+
        v                         |
   HTTP/gRPC call          Async fallback
        |                         |
        v                         v
   Shared Redis Store   (local cache for hot keys)

The insight? Decouple the concern, not the data. By moving the limiter into its own lightweight service that talks to a shared store (Redis, for example), we gain:

Independent scaling – spin up more limiter instances when traffic spikes, without touching the API.
Fault isolation – a crash in the limiter won’t bring down login or checkout; the API can degrade gracefully (e.g., open‑circuit or allow‑through).
Rapid iteration – we can experiment with sliding windows, burst allowances, or feature‑flagged limits and deploy them in seconds.
Clear contract – the API just asks “Am I allowed?” and gets a boolean; the limiter owns the algorithm.

Of course, there’s a trade‑off: added latency from an extra network hop and the request must make, and the operational overhead of another service. But for a high‑traffic, latency‑tolerant gateway (think API gateway or edge layer), that extra hop is negligible compared to the gains in flexibility and resilience.

Wielding the Power (Code & Examples)

The Struggle: Monolithic Token Bucket (Pseudo‑Go)

// rateLimiter.go – lives inside the monolith
type Bucket struct {
    capacity int64
    tokens   int64
    mu       sync.Mutex
    last     time.Time
}

func NewBucket(cap int64, fill time.Duration) *Bucket {
    return &Bucket{
        capacity: cap,
        tokens:   cap,
        last:     time.Now(),
    }
}

// Allow consumes a token if available.
func (b *Bucket) Allow() bool {
    b.mu.Lock()
    defer b.mu.Unlock()
    now := time.Now()
    // refill based on elapsed time
    b.tokens = int64(math.Min(float64(b.capacity),
        float64(b.tokens)+float64(now.Sub(b.last))/float64(fill)*float64(b.capacity))
    b.last = now
    if b.tokens > 0 {
        b.tokens--
        return true
    }
    return false
}

Traps I fell into:

Forgetting to reset last after a refill burst → tokens would accumulate incorrectly.
Using a plain sync.Mutex under high QPS caused contention spikes; latency went from 1 ms to 12 ms at 10k rps.
Changing the bucket size required a full redeploy; a mis‑configured limit once took down the checkout flow for 15 minutes.

The Victory: Microservice Limiter (Go + Redis)

Limiter Service API (/allow?key=user:123&cost=1)

// limiterHandler.go
func AllowHandler(w http.ResponseWriter, r *http.Request) {
    key := r.URL.Query().Get("key")
    cost, _ := strconv.Atoi(r.URL.Query().Get("cost"))
    if key == "" {
        http.Error(w, "missing key", http.StatusBadRequest)
        return
    }

    // Lua script for atomic token‑bucket refill + consumption
    script := `
        local key = KEYS[1]
        local capacity = tonumber(ARGV[1])
        local fill_rate = tonumber(ARGV[2])   // tokens per second
        local now = tonumber(ARGV[3])
        local cost = tonumber(ARGV[4])

        local data = redis.call('HMGET', key, 'tokens', 'last')
        local tokens = tonumber(data[1]) or capacity
        local last = tonumber(data[2]) or now

        local delta = (now - last) * fill_rate
        if delta > 0 then
            tokens = math.min(capacity, tokens + delta)
            last = now
        end

        if tokens >= cost then
            tokens = tokens - cost
            redis.call('HMSET', key, 'tokens', tokens, 'last', last)
            redis.call('EXPIRE', key, 3600) // optional TTL
            return 1
        else
            redis.call('HMSET', key, 'tokens', tokens, 'last', last)
            return 0
        end
    `

    result, err := redisClient.Eval(ctx, script, []string{key},
        capacity, fillRate, float64(time.Now().UnixNano()/1e9), cost).Int()
    if err != nil {
        // fail‑open: allow request if Redis is down
        http.Error(w, "limiter error, failing open", http.StatusInternalServerError)
        return
    }

    if result == 0 {
        w.WriteHeader(http.StatusTooManyRequests)
        return
    }
    w.WriteHeader(http.StatusOK)
}

Why this beats the monolith version:

Atomicity without locks – the Lua script runs inside Redis, eliminating contention on the API side.
Independent scaling – we can run 20 limiter pods behind a load‑balancer; each handles a slice of the key space.
Zero‑downtime deploys – update the Lua script or change capacity/fill_rate via a config map; no need to touch the API.
Observability – we expose Prometheus metrics (limiter_allowed_total, limiter_denied_total) directly from the limiter service.

Common pitfalls to avoid:

Network timeout traps – set a tight, reasonable deadline (e.g., 5 ms) on the Redis call; if it exceeds, fail‑open or fall back to a local in‑memory bucket for that instance.
Key explosion – avoid using unbounded cardinality (like per‑request UUIDs) as the key; stick to user IDs, API keys, or IP ranges.
Cache stampede – if many instances miss the local cache simultaneously, they’ll hammer Redis; use a small probabilistic local cache (e.g., tlru) or a token‑bucket with a short TTL to smooth traffic.

Why This New Power Matters

Now that the rate limiter lives its own life, our API team can ship features faster. Want to burst‑allow premium users for a flash sale? Just tweak the limiter’s config and roll it out in under a minute—no need to coordinate a monolith release across three teams. When a noisy neighbor tries to scrape our endpoints, the limiter service isolates the impact; the login and payment paths keep humming because they’re not stuck waiting on a locked counter.

And the best part? The pattern scales beyond rate limiting. Any cross‑cutting concern—authentication token validation, request logging, feature‑flag evaluation—can follow the same “decouple the logic, share the state” principle. It’s like discovering a cheat code in Celeste that lets you bypass a frustrating spike section without losing the feeling of mastery.

So, next time you feel the monolith tightening its grip around a shared concern, ask yourself: Can I extract this into its own service? If the answer is yes, you’ll likely unlock smoother deploys, cleaner failure domains, and the satisfaction of watching your system evolve like a well‑crafted RPG character—gaining new abilities without breaking the story.

Your turn: Grab a piece of paper (or a digital whiteboard) and draw the monolith vs. microservice diagram for a concern you’ve wrestled with lately. Where would you extract the service? What would the contract look like? Share your sketch or thoughts in the comments—I’d love to see what quests you embark on next! 🚀