Timevolt

Posted on Jul 1

Rate Limiting Like a Jedi: Microservices vs Monolith – Choose Your Path Wisely

#systemdesign #architecture #backend #programming

The Quest Begins (The "Why")

I was tasked with protecting a brand‑new API from being hammered by overeager clients. Think of it like guarding the Death Star’s exhaust port – if too many requests slip through, the whole thing blows up. My first instinct? Slap a simple counter in the service code, increment it on every request, and return 429 Too Many Requests when the count exceeds a threshold.

It worked… until traffic spiked during a promo launch. Suddenly we had dozens of instances of the service behind a load balancer, each with its own private counter. The limit was being applied per instance, not globally, and clients could bypass the guard by spreading their requests across pods. I felt like a Stormtrooper who’d just missed the target – embarrassing and a little terrifying.

That moment sparked the classic debate: Should I keep the rate limiter inside the monolith, or extract it into its own microservice? I needed a clear insight that would tell me when each approach actually shines.

The Revelation (The Insight)

After a few sleepless nights (and way too much coffee), the realization hit me like a lightsaber duel: the decision isn’t about microservices vs monolith in the abstract; it’s about where the state lives and how much independent scaling you truly need.

A rate limiter fundamentally needs shared state – a counter (or token bucket) that all request paths can see and update atomically. If you keep that state inside each service instance, you’re fighting physics. If you externalize the state (think Redis, Consul, or a dedicated store), the limiter becomes a thin wrapper that can live anywhere.

So the real question becomes:

Scenario	Monolith‑friendly?	Microservice‑friendly?
Low to moderate traffic, single deployment	✅ Simple, no extra moving parts	❌ Overkill
High traffic, many replicas, need independent scaling	❌ State sharing becomes a bottleneck	✅ Limiter can be scaled separately
Team owns the limiter as a shared concern (multiple services)	❌ Duplication, inconsistency risk	✅ Single source of truth
Operational overhead tolerance low	✅ One deploy, one monitor	❌ Extra service to monitor, version, secure

The critical insight is this: Extract the rate limiter into its own service only when you need to scale or manage it independently of the business logic. Otherwise, a lightweight in‑process limiter backed by a fast shared store (Redis) inside the monolith is more than enough – and it saves you a bunch of operational overhead.

Think of it like choosing whether to bring a lightsaber or a blaster to a fight. If you’re dueling a single Sith Lord, the lightsaber (monolith) is elegant and sufficient. If you’re facing an army of droids that need rapid, distributed firepower, you bring in the blaster squad (microservice).

Wielding the Power (Code & Examples)

The Struggle: In‑Process Counter (the trap)

// bad-rate-limiter.go  (inside each API instance)
var (
    mu      sync.Mutex
    count   int
    limit   = 100
    window  = time.Minute
    reset   = time.Now().Add(window)
)

func Allow() bool {
    mu.Lock()
    defer mu.Unlock()

    if time.Now().After(reset) {
        count = 0
        reset = time.Now().Add(window)
    }

    if count >= limit {
        return false // 429
    }
    count++
    return true
}

Why this fails:

Each replica has its own count.
Under a load balancer, the effective limit becomes limit * replicas.
No one request can slip through each replica’s bucket.

The Victory: Externalized Token Bucket (microservice or monolith with Redis)

First, the shared store – we’ll use Redis for its atomic INCR and expiration.

# 127.0.0.1:6379> SET rate-limit:<user-id> 0 EX 60 NX
# (if key doesn't exist, set to 0 with 60‑second TTL)

Now the limiter logic (same code works whether you call it from a monolith handler or a separate microservice):

// good-rate-limiter.go
import (
    "context"
    "time"

    "github.com/go-redis/redis/v8"
)

var rdb = redis.NewClient(&redis.Options{
    Addr:     "localhost:6379",
    Password: "", // no password set
    DB:       0,
})

const (
    limit  = 100   // requests per window
    window = 60    // seconds
)

func Allow(ctx context.Context, userID string) (bool, error) {
    key := fmt.Sprintf("rate-limit:%s", userID)

    // Lua script ensures the increment and expiry check are atomic
    lua := redis.NewScript(`
        local current = redis.call('INCR', KEYS[1])
        if current == 1 then
            redis.call('EXPIRE', KEYS[1], ARGV[1])
        end
        return current
    `)

    current, err := lua.Run(ctx, rdb, []string{key}, window).Result()
    if err != nil {
        return false, err
    }

    return current.(int64) <= limit, nil
}

Why this wins:

The INCR operation is atomic across all clients, guaranteeing a global limit.
The key auto‑expires after the window, so we don’t need manual cleanup.
If traffic grows, we can spin up more API instances without worrying about the limiter becoming a bottleneck – Redis can be clustered and scaled independently.

Common traps to avoid

Forgetting the expiry – If you only INCR without setting a TTL, the counter will keep growing and eventually overflow or give false positives.
Using a separate Redis instance per service – That defeats the purpose; you need a single logical store (or a Redis Cluster) that all instances talk to.
Ignoring network latency – A remote call adds a few milliseconds. For ultra‑low‑latency APIs, consider an embedded proxy like Envoy’s rate‑limit filter or a local cache that periodically syncs with Redis (still safer than per‑instance counters).

Why This New Power Matters

By externalizing the state, you turned a fragile, per‑instance hack into a robust, globally‑consistent guardrail. Now you can:

Scale your API horizontally without worrying about the limit multiplying.
Tune the limiter independently – bump the Redis memory, add replicas, or switch to a faster store without touching your business logic.
Share the limiter across multiple services (auth, payments, etc.) with a single source of truth – no more “why is service A allowing 150 req/s while service B caps at 100?”

In short, you’ve earned the Jedi mastery of rate limiting: you know when to wield the lightsaber (keep it simple) and when to call in the blaster squad (extract the service).

Your Turn

Grab a service you’re building right now. Ask yourself: Does it truly need an independent rate‑limiting decision, or can a fast shared store handle it?

If you’re running a modest side‑project with a single deployment, try the in‑process version first – you’ll be surprised how far it gets you.
If you’re anticipating spikes, multiple replicas, or you want to hand the limiter off to a platform team, give the Redis‑backed version a spin.

Drop a comment below with your choice and any gotchas you hit. May the force (and the limit) be with you! 🚀

DEV Community