One client shouldn't be able to drown your API: rate limiting with Bucket4j and Redis

#springboot #java #redis #backend

Day 13 of building OrderHub in the open. We have a monolith with real persistence, validation, exception handling, OpenAPI docs, and a Redis cache. Today we add the thing that keeps all of that standing up when someone points a firehose at it: a rate limiter.

The problem is simple to state. Without a limit, one buggy retry loop, one scraper, or one genuinely malicious client can fire thousands of requests a second and starve everyone else — or knock your database over. You want a fair ceiling on how fast any single client can call you. The token bucket is the classic way to draw that ceiling, and it's nicer than a plain counter because it allows short bursts while still capping the sustained rate.

Here's the mental model. Imagine a bucket that holds up to capacity tokens and refills them steadily over time. Every request has to spend one token to get in. If the bucket has tokens, the request is allowed and the count drops by one. If it's empty, the request is turned away. Because the bucket can be full when traffic arrives, a client can spend a burst of saved-up tokens all at once — but once it's drained, they can only go as fast as the bucket refills. Capacity is the burst size; the refill rate is the long-run limit. "20 requests per minute per client" is just a bucket of 20 tokens that refills 20 tokens over 60 seconds.

On the JVM, Bucket4j is the library for this. Defining the limit is two lines:

Bandwidth limit = Bandwidth.builder()
    .capacity(20)
    .refillGreedy(20, Duration.ofMinutes(1))
    .build();
Bucket bucket = Bucket.builder().addLimit(limit).build();

refillGreedy matters: it drips tokens back continuously rather than dumping the whole allowance at the end of the window. A client recovers smoothly, and you avoid the synchronized bursts you'd get if every bucket topped up on the same tick.

One bucket isn't enough, though — a single global bucket would throttle everyone together, which is useless. The limit has to be per client. So we resolve a key for each request: prefer a stable identity like an X-API-Key header, and fall back to the caller's IP (honoring X-Forwarded-For's first hop when we're behind a proxy). Each key gets its own bucket, cached so repeat requests reuse it. Skip that caching and every request gets a fresh, full bucket and the limit never bites.

The enforcement lives in a servlet filter. A filter runs at the very edge of the request, before Spring MVC even dispatches to a controller, so a throttled request costs almost nothing — it never touches the service or Postgres. OncePerRequestFilter guarantees it runs exactly once. The whole decision is one atomic call:

ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
if (probe.isConsumed()) {
    response.setHeader("X-RateLimit-Remaining", "" + probe.getRemainingTokens());
    chain.doFilter(request, response);   // allowed
} else {
    // 429 with Retry-After + a problem+json body
}

When we throttle, we return 429 Too Many Requests with a Retry-After header — whole seconds, rounded up, so a client that waits exactly that long is guaranteed a token. The body is an RFC-7807 ProblemDetail, the same problem+json envelope the rest of the API already uses, so clients parse one consistent error shape everywhere. On success we advertise X-RateLimit-Remaining so a well-behaved client can pace itself before it ever hits the wall.

Now the interesting part: making it distributed. An in-JVM bucket only limits traffic hitting that one instance. Run three replicas behind a load balancer and a client effectively gets three times the limit — the counts never talk to each other. Since we already run Redis for the cache, we let the buckets live there. Bucket4j's LettuceBasedProxyManager stores each client's bucket state under a Redis key and mutates it with an atomic compare-and-swap Lua script, so every instance decrements the same counter. The limit is enforced once across the whole cluster:

ProxyManager<byte[]> pm = Bucket4jLettuce.casBasedBuilder(connection).build();
Bucket bucket = pm.getProxy(("ratelimit:" + key).getBytes(UTF_8), () -> config);

Same algorithm, shared storage. If Redis is unreachable at startup we fall back to per-instance buckets rather than failing to boot — correct enough for a single node, and the app never dies just because Redis blinked.

Everything is configurable via @ConfigurationProperties — capacity and refill period, bound per profile. Dev gets a roomy 100/min so you don't fight your own tests; prod gets a tighter 60/min. And it's proven, not assumed: an integration test on a real Postgres and real Redis fires capacity requests as one client (all allowed), then one more, and asserts a 429 with a Retry-After. A second client with a different key still gets a full allowance — the limit really is per client.

Try the live token-bucket demo (drag the sliders, mash the button until it 429s) and read the backend and React walkthroughs here: https://dev48v.infy.uk/orderhub.php

Code: https://github.com/dev48v/order-hub-from-zero

Next up: Resilience4j circuit breakers.