The Quest Begins (The "Why")
I was tasked with protecting a brand‑new API from being hammered by overeager clients. Think of it like guarding the Death Star’s exhaust port – if too many requests slip through, the whole thing blows up. My first instinct? Slap a simple counter in the service code, increment it on every request, and return 429 Too Many Requests when the count exceeds a threshold.
It worked… until traffic spiked during a promo launch. Suddenly we had dozens of instances of the service behind a load balancer, each with its own private counter. The limit was being applied per instance, not globally, and clients could bypass the guard by spreading their requests across pods. I felt like a Stormtrooper who’d just missed the target – embarrassing and a little terrifying.
That moment sparked the classic debate: Should I keep the rate limiter inside the monolith, or extract it into its own microservice? I needed a clear insight that would tell me when each approach actually shines.
The Revelation (The Insight)
After a few sleepless nights (and way too much coffee), the realization hit me like a lightsaber duel: the decision isn’t about microservices vs monolith in the abstract; it’s about where the state lives and how much independent scaling you truly need.
A rate limiter fundamentally needs shared state – a counter (or token bucket) that all request paths can see and update atomically. If you keep that state inside each service instance, you’re fighting physics. If you externalize the state (think Redis, Consul, or a dedicated store), the limiter becomes a thin wrapper that can live anywhere.
So the real question becomes:
| Scenario | Monolith‑friendly? | Microservice‑friendly? |
|---|---|---|
| Low to moderate traffic, single deployment | ✅ Simple, no extra moving parts | ❌ Overkill |
| High traffic, many replicas, need independent scaling | ❌ State sharing becomes a bottleneck | ✅ Limiter can be scaled separately |
| Team owns the limiter as a shared concern (multiple services) | ❌ Duplication, inconsistency risk | ✅ Single source of truth |
| Operational overhead tolerance low | ✅ One deploy, one monitor | ❌ Extra service to monitor, version, secure |
The critical insight is this: Extract the rate limiter into its own service only when you need to scale or manage it independently of the business logic. Otherwise, a lightweight in‑process limiter backed by a fast shared store (Redis) inside the monolith is more than enough – and it saves you a bunch of operational overhead.
Think of it like choosing whether to bring a lightsaber or a blaster to a fight. If you’re dueling a single Sith Lord, the lightsaber (monolith) is elegant and sufficient. If you’re facing an army of droids that need rapid, distributed firepower, you bring in the blaster squad (microservice).
Wielding the Power (Code & Examples)
The Struggle: In‑Process Counter (the trap)
// bad-rate-limiter.go (inside each API instance)
var (
mu sync.Mutex
count int
limit = 100
window = time.Minute
reset = time.Now().Add(window)
)
func Allow() bool {
mu.Lock()
defer mu.Unlock()
if time.Now().After(reset) {
count = 0
reset = time.Now().Add(window)
}
if count >= limit {
return false // 429
}
count++
return true
}
Why this fails:
- Each replica has its own
count. - Under a load balancer, the effective limit becomes
limit * replicas. - No one request can slip through each replica’s bucket.
The Victory: Externalized Token Bucket (microservice or monolith with Redis)
First, the shared store – we’ll use Redis for its atomic INCR and expiration.
# 127.0.0.1:6379> SET rate-limit:<user-id> 0 EX 60 NX
# (if key doesn't exist, set to 0 with 60‑second TTL)
Now the limiter logic (same code works whether you call it from a monolith handler or a separate microservice):
// good-rate-limiter.go
import (
"context"
"time"
"github.com/go-redis/redis/v8"
)
var rdb = redis.NewClient(&redis.Options{
Addr: "localhost:6379",
Password: "", // no password set
DB: 0,
})
const (
limit = 100 // requests per window
window = 60 // seconds
)
func Allow(ctx context.Context, userID string) (bool, error) {
key := fmt.Sprintf("rate-limit:%s", userID)
// Lua script ensures the increment and expiry check are atomic
lua := redis.NewScript(`
local current = redis.call('INCR', KEYS[1])
if current == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return current
`)
current, err := lua.Run(ctx, rdb, []string{key}, window).Result()
if err != nil {
return false, err
}
return current.(int64) <= limit, nil
}
Why this wins:
- The
INCRoperation is atomic across all clients, guaranteeing a global limit. - The key auto‑expires after the window, so we don’t need manual cleanup.
- If traffic grows, we can spin up more API instances without worrying about the limiter becoming a bottleneck – Redis can be clustered and scaled independently.
Common traps to avoid
-
Forgetting the expiry – If you only
INCRwithout setting a TTL, the counter will keep growing and eventually overflow or give false positives. - Using a separate Redis instance per service – That defeats the purpose; you need a single logical store (or a Redis Cluster) that all instances talk to.
- Ignoring network latency – A remote call adds a few milliseconds. For ultra‑low‑latency APIs, consider an embedded proxy like Envoy’s rate‑limit filter or a local cache that periodically syncs with Redis (still safer than per‑instance counters).
Why This New Power Matters
By externalizing the state, you turned a fragile, per‑instance hack into a robust, globally‑consistent guardrail. Now you can:
- Scale your API horizontally without worrying about the limit multiplying.
- Tune the limiter independently – bump the Redis memory, add replicas, or switch to a faster store without touching your business logic.
- Share the limiter across multiple services (auth, payments, etc.) with a single source of truth – no more “why is service A allowing 150 req/s while service B caps at 100?”
In short, you’ve earned the Jedi mastery of rate limiting: you know when to wield the lightsaber (keep it simple) and when to call in the blaster squad (extract the service).
Your Turn
Grab a service you’re building right now. Ask yourself: Does it truly need an independent rate‑limiting decision, or can a fast shared store handle it?
- If you’re running a modest side‑project with a single deployment, try the in‑process version first – you’ll be surprised how far it gets you.
- If you’re anticipating spikes, multiple replicas, or you want to hand the limiter off to a platform team, give the Redis‑backed version a spin.
Drop a comment below with your choice and any gotchas you hit. May the force (and the limit) be with you! 🚀
Top comments (0)