Timevolt

Posted on Jul 3

Microservices vs Monolith: Picking Your Weapon Like Neo in The Matrix

#systemdesign #architecture #backend #programming

The Quest Begins (The "Why")

I still remember the first time I tried to ship a rate limiter for our public API. We were a small team, the codebase was a tidy monolith, and the feature felt like a simple “add a counter, reject if over limit” kind of thing. I dropped the logic into the request‑handling middleware, wrote a quick unit test, and pushed it.

At first, everything looked fine. Then traffic spiked during a product launch. The monolith started to choke: every request, whether it needed rate limiting or not, was pulling the same Redis connection, bloating the CPU, and causing latency spikes across unrelated endpoints. The ops team was paged at 2 a.m., and I felt like I’d just triggered a boss fight I wasn’t prepared for.

That night, after a lot of coffee and a few too many debugging sessions, I asked myself: Why am I coupling a cross‑cutting concern like rate limiting to the core business logic? The answer hit me like a plot twist: the limiter deserved its own stage.

The Revelation (The Insight)

The big insight was this: rate limiting is a infrastructure concern, not a domain concern. When you treat it as a first‑class service, you gain three super‑powers:

Independent scaling – you can throw more instances at the limiter without scaling the whole app.
Fault isolation – if the limiter misbehaves, your core services stay up (think circuit breaker pattern).
Rapid iteration – you can swap algorithms (token bucket → leaky bucket → sliding window) without redeploying the monolith.

In a monolith, every change to the limiter means a full rebuild and redeploy of the entire application. In a microservice, you push a new limiter container and the rest of the system keeps humming.

ASCII picture of the shift

Monolith (before)                         Microservice (after)
+-------------------+                     +-------------------+
|   API Handler     |                     |   API Handler     |
|  + Rate Limiter   |   <-- calls --->    |   + HTTP Client   |
|   (inline logic)  |                     |   to Limiter Svc  |
+-------------------+                     +-------------------+
          ^                                         ^
          |                                         |
   +--------------+                         +--------------+
   |   Shared DB  |                         |   Rate Limiter |
   |   (Redis)    |                         |   Service      |
   +--------------+                         +----------------+
                                            (own scaling,
                                             own deploys)

That little arrow from the API handler to the limiter service is where the magic lives.

Wielding the Power (Code & Examples)

The Struggle: Inline Rate Limiter (Monolith)

# app/middleware.py
import time
import redis
from flask import request, abort

r = redis.Redis(host='redis', port=6379, db=0)

def rate_limit(key, limit=100, window=60):
    now = int(time.time())
    pipeline = r.pipeline()
    pipeline.zadd(key, {str(now): now})
    pipeline.zremrangebyscore(key, 0, now - window)
    pipeline.zcard(key)
    pipeline.expire(key, window)
    _, _, count, _ = pipeline.execute()
    if count > limit:
        abort(429, description="Too many requests")

@app.before_request
def check_limit():
    key = f"rl:{request.remote_addr}:{request.endpoint}"
    rate_limit(key, limit=20, window=10)   # 20 req/10s per endpoint

Traps I fell into:

Blocking Redis calls – every request waited for the pipeline, adding latency even for endpoints that didn’t need limiting.
Hard‑coded limits – changing the limit meant a new deploy of the whole API.
No observability – hitting the limit just returned 429; we had no metrics on how often we were throttling.

The Victory: Extracted Rate Limiter Microservice

Now the limiter lives in its own tiny service, talking over HTTP/gRPC. The API just fires a quick request and moves on.

Limiter service (FastAPI + Redis, token bucket):

# limiter/app.py
from fastapi import FastAPI, HTTPException
import time, redis
from pydantic import BaseModel

app = FastAPI()
r = redis.Redis(host='redis', port=6379, db=0)

class CheckRequest(BaseModel):
    key: str
    limit: int   # tokens per refill
    refill_rate: float  # tokens per second

@app.post("/allow")
def allow(req: CheckRequest):
    now = time.time()
    lua = """
        local key = KEYS[1]
        local limit = tonumber(ARGV[1])
        local rate = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        local last = redis.call('HGET', key, 'last')
        local tokens = redis.call('HGET', key, 'tokens')
        if last == false then
            last = now
            tokens = limit
        else
            last = tonumber(last)
            tokens = math.min(limit, tonumber(tokens) + (now - last) * rate)
        end
        if tokens >= 1 then
            tokens = tokens - 1
            redis.call('HMSET', key, 'tokens', tokens, 'last', now)
            return 1
        else
            redis.call('HMSET', key, 'tokens', tokens, 'last', now)
            return 0
        end
    """
    allowed = r.eval(lua, 1, req.key, req.limit, req.refill_rate, now)
    if allowed == 0:
        raise HTTPException(status_code=429, detail="Rate limited")
    return {"allowed": True}

API side (thin client):

# app/gateway.py
import httpx
from fastapi import Request, HTTPException

LIMITER_URL = "http://limiter-service:8000/allow"

async def limiter_dependency(request: Request):
    key = f"{request.client.host}:{request.url.path}"
    payload = {
        "key": key,
        "limit": 20,
        "refill_rate": 2.0   # 2 tokens per second => 20/10s burst
    }
    async with httpx.AsyncClient() as client:
        resp = await client.post(LIMITER_URL, json=payload)
        if resp.status_code != 200:
            raise HTTPException(status_code=429, detail="Rate limited")

Why this beats the inline version:

Non‑blocking for the API – the HTTP call is fast (often <1 ms) and can be timed out or circuit‑broken without stalling the request thread.
Dynamic configuration – you can tweak limit and refill_rate via a config service or feature flag without redeploying the API.
Observability – the limiter service emits its own metrics (requests allowed, denied, latency) which you can scrape with Prometheus.
Failure isolation – if the limiter crashes, the API can fall back to an “open” mode (allow all) or return a friendly error, while the rest of the system stays up.

A common mistake I see now is over‑engineering the contract—trying to pass the whole request object to the limiter. Keep it minimal: just an identifier, a limit, and a refill rate. The less data you shuffle, the faster the call.

Why This New Power Matters

By extracting the rate limiter into its own microservice, I turned a fragile, tightly‑coupled hack into a reusable, scalable building block. The team now treats limiters like LEGO bricks: snap them in where you need them, swap them out when the game changes, and never worry about bringing down the whole castle when you remodel a single room.

The same pattern works for other cross‑cutting concerns—authentication, logging, feature flags—each gaining the same three super‑powers: independent scaling, fault isolation, and rapid iteration.

Your Turn

Grab a piece of your monolith that feels like a “global helper” (maybe a cache wrapper, a logging middleware, or yes, another rate limiter). Spend an afternoon extracting it into a tiny service with a clear contract. Deploy it behind a lightweight API gateway, add a health check, and watch how the rest of your system breathes easier.

Challenge: After you’ve extracted the service, try swapping the algorithm (token bucket → sliding window) without touching any of your consumers. If you can do that, you’ve officially leveled up your architecture game.

Now go forth, brave developer—your next microservice adventure awaits! 🚀

DEV Community