Scaling Rate Limiting from Single‑Node to a Distributed Go+Redis Token Bucket — 10x Throughput Under Load (with Degradation Strategy)

#python #programming

At 2 AM, an alert pulled me out of bed — the database connection pool of our order service was exhausted, and most requests were returning 504. It turned out a marketing campaign was driving triple the usual traffic. Our in‑memory per‑instance token bucket rate limiter, deployed across three replicas, operated in isolation; global rate limiting was effectively non‑existent. That moment I realized: if the state is not shared, rate limiting is just an illusion.

Breaking Down the Problem

This is distressingly common in microservices. To protect downstream services, teams often set a limit like “max 200 QPS per instance”. Deploy three instances, and you might assume global traffic will be capped at 600 QPS. In reality, load balancing is rarely perfectly even — one instance exhausts its 200 QPS quota while the other two still have headroom. The actual peak hitting the downstream can easily exceed 900 QPS. This is the fatal flaw of per‑instance rate limiting at scale: the limiting logic is chopped up by instance boundaries, becoming “paper‑only” rate limiting from a global perspective.

The root cause is simple: the token bucket’s current token count and last refill timestamp live purely in memory and are not shared across instances. A typical Redis fixed‑window counter (INCR + EXPIRE) can share state, but it suffers from boundary spikes — the last 100 ms of one second and the first 100 ms of the next can overlap to produce a burst of twice the allowed rate, still dangerous for downstream systems. We needed a solution that shares state and smooths traffic — a distributed token bucket.

Design

Choice: Go + Redis + Lua script for a distributed token bucket.

Why not the other options?

Nginx/gateway‑level rate limiting: Adds a proxy hop and decouples from business logic, making it hard to implement fine‑grained controls (e.g., mixed limiting by user and API).
Pure Redis sliding window: Doable with sorted sets, but you must constantly evict expired members, incurring memory and CPU overhead, and the algorithmic complexity often introduces performance bottlenecks.
Go distributed rate‑limiting libraries: Many are unmaintained or only support simple fixed‑window counters, lacking the flexibility of a token bucket.

The final architecture is straightforward: move the token bucket’s core state (tokens, last_refill_time) into Redis, and use a Lua script to atomically calculate and update them. Thanks to Redis’s single‑threaded execution model, unlimited concurrent requests are serialized safely. The application side wraps this in a DistributedTokenBucket struct that integrates a built‑in degradation strategy: when Redis is unavailable (timeout, disconnection), it automatically falls back to a local golang.org/x/time/rate token bucket. Even if Redis completely goes down, downstream services are not overwhelmed — we degrade to single‑instance rate limiting, preserving the fundamental protection.

Core Implementation

The following Lua script handles the atomic “token generation + consumption check” step. It accepts the timestamp as an argument to avoid relying on potentially inconsistent system clocks across instances. (Using redis.call('TIME') is also an option, depending on your consistency paranoia.)

// 这段代码解决：如何用一段 Lua 保证“计算新增令牌 -> 判断是否足够 -> 扣减”的原子性
const tokenBucketLua = `
local key       = KEYS[1]              -- 令牌桶 key
local rate      = tonumber(ARGV[1])    -- 每秒生成令牌数
local capacity  = tonumber(ARGV[2])    -- 桶容量
local now       = tonumber(ARGV[3])    -- 当前时间戳（毫秒）
local requested = tonumber(ARGV[4])    -- 请求令牌数

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1])
local last_refill = tonumber(bucket[2])

if tokens == nil then
    -- 首次访问，初始化令牌桶
    tokens = capacity
    last_refill = now
end

-- 计算经过的时间及新增令牌数
local delta = math.max(0, now - last_refill)
local new_tokens = math.floor(delta * rate / 1000)
tokens = math.min(capacity, tokens + new_tokens)

local allowed = 0
if tokens >= requested then
    tokens = tokens - requested
    allowed = 1
end

-- 更新 Redis 中的状态，并设置一个合理的 TTL 防止冷数据残留
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 60)

return {allowed, tokens}
`

Next, the Go struct and the core Take method. Its responsibility is to execute the Lua script, handle Redis errors, and trigger the fallback path when Redis is not healthy.

// 这段代码解决：封装 Redis 调用，提供限流入口，并在 Redis 不可用时降级到本地令牌桶
import (
    "context"
    "errors"
    "time"

    "github.com/redis/go-redis/v9"
    "golang.org/x/time/rate"
)

type DistributedTokenBucket struct {
    rdb        *redis.Client
    script     *redis.Script
    key        string
    rate       float64 // 令牌/秒
    capacity   int     // 桶容量
    fallback   *rate.Limiter // 本地降级限流器
}

func NewDistributedTokenBucket(rdb *redis.Client, key string, ratePerSec float64, capacity int) *DistributedTokenBucket {
    // 本地降级器：容量和速率取全局值的一部分，保护下游
    fallbackLimiter := rate.NewLimiter(rate.Limit(ratePerSec), capacity)
    return &DistributedTokenBucket{
        rdb:      rdb,
        script:   redis.NewScript(tokenBucketLua),
        key:      key,
        rate:     ratePerSec,
        capacity: capacity,
        fallback: fallbackLimiter,
    }
}

func (b *DistributedTokenBucket) Take(ctx context.Context) bool {
    now := time.Now().UnixMilli()
    result, err := b.script.Run(ctx, b.rdb, []string{b.key}, b.rate, b.capacity, now, 1).Result()
    if err != nil {
        // Redis 不可用时，降级为本地令牌桶
        return b.fallback.Allow()
    }

    values, ok := result.([]interface{})
    if !ok || len(values) < 1 {
        return b.fallback.Allow()
    }

    allowed, ok := values[0].(int64)
    if !ok {
        return b.fallback.Allow()
    }

    return allowed == 1
}

This design ensures the happy path is fully distributed and cooperative, while the unhappy path keeps the system alive. No single point of failure in Redis should ever bypass all protection.

In our load tests, replacing the old per‑instance token bucket with this distributed implementation allowed us to safely absorb a 10x increase in global QPS without crashing the downstream. The fallback kicked in seamlessly during Redis failover, proving that the “paper‑only” days were over.