Rate Limiting at Scale: Building Fixed Window and Token Bucket in Go

amogh tyagi — Fri, 10 Apr 2026 03:51:26 +0000

Rate limiting is one of those things every backend engineer knows they need but few actually build from scratch. Most reach for a library. I built mine — two algorithms, Redis-backed, with Lua scripting for atomicity. Here's what the tradeoffs actually look like when you're writing the implementation instead of just configuring it.

Why build it instead of using a library?

Mostly to understand what the library is doing. Rate limiting looks simple until you think about concurrent requests hitting the same counter at the same millisecond. That's where the interesting problems live — and a library abstracts all of that away from you.

I implemented two algorithms: fixed window and token bucket. They solve the same problem differently, and the difference matters depending on your traffic pattern.

Fixed window

The simplest mental model: you get N requests per time window. Window resets, counter resets.


func (fw *FixedWindow) Allow(key string) (bool, error) {
    count, err := fw.store.Increment(key, fw.windowSize)
    if err != nil {
        return false, err
    }
    return count <= fw.limit, nil
}

The problem with naive fixed window is the boundary attack. If your window resets every minute, a client can send 100 requests at 11:59 and another 100 at 12:00 — 200 requests in two seconds, double your intended limit. This is a well-known flaw, and it's why sliding window exists. Fixed window is fast and simple, but you need to know what you're trading off.

Token bucket

Token bucket is more nuanced. You have a bucket with a maximum capacity. Tokens refill at a constant rate. Each request consumes a token. If the bucket is empty, the request is rejected.


func (tb *TokenBucket) Allow(key string) (bool, error) {
    now := time.Now().Unix()
    tokens, err := tb.store.GetTokens(key, tb.refillRate, tb.capacity, now)
    if err != nil {
        return false, err
    }
    return tokens > 0, nil
}

This handles bursts gracefully. A client that's been idle accumulates tokens up to the bucket capacity, then can burst at full speed until empty. It's a better model for real API traffic — users aren't perfectly uniformly distributed.

The complexity cost: you need to track both token count and last refill timestamp, and compute the refill delta on every request. That's two reads and a write per check, which is where atomicity becomes critical.

The Redis + Lua atomicity problem

Here's the race condition that bites you if you're not careful. With token bucket:

Read current token count
Compute new count based on elapsed time
Write new count back

If two requests hit simultaneously, both read the same token count, both compute independently, and both write — one of the writes gets lost. You've now allowed more requests than you should have.

The fix is making the read-compute-write a single atomic operation. Redis supports this via Lua scripts, which execute atomically on the Redis server:


local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now

local elapsed = now - last_refill
local new_tokens = math.min(capacity, tokens + elapsed * refill_rate)

if new_tokens < 1 then
    return 0
end

redis.call('HMSET', key, 'tokens', new_tokens - 1, 'last_refill', now)
return 1

No locks. No transactions. The entire check-and-decrement happens in one Redis round trip, atomically. This is the right way to do distributed rate limiting.

Fixed window Lua is simpler


local count = redis.call('INCR', KEYS[1])
if count == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return count

INCR is already atomic in Redis, but wrapping it with the EXPIRE logic in Lua ensures the TTL gets set exactly once on the first request — no race between increment and expire.

CI with GitHub Actions

Every push runs the test suite automatically. The workflow is straightforward:


name: CI
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    services:
      redis:
        image: redis
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-go@v4
        with:
          go-version: '1.21'
      - run: go test ./...

The key part is spinning up a real Redis instance in the CI environment as a service container. Testing against a real Redis rather than a mock means your Lua scripts actually get executed and validated — mocks won't catch scripting errors.

Fixed window vs token bucket — when to use which
Fixed window Token bucket
Implementation Simple More complex
Burst handling Poor Good
Boundary vulnerability Yes No
Redis ops per request 1 1 (via Lua)
Best for Internal services, simple APIs Public APIs, user-facing rate limits

For most public APIs, token bucket is the right default. Fixed window is fine for internal service-to-service limits where you control both sides and traffic is predictable.

What I'd add next

Sliding window log — the theoretically correct algorithm that tracks individual request timestamps. More memory-intensive than either of these, but eliminates the boundary problem of fixed window without the refill complexity of token bucket. Also a sliding window counter, which approximates it cheaply using two fixed windows.

The full source

Both algorithms, Redis store, Lua scripts, and CI config are on GitHub. The code is designed to be readable — if you're implementing your own, the Lua scripts are the part worth studying.

Building a Production WebSocket Chat Server in Go — What I Learned

amogh tyagi — Thu, 09 Apr 2026 22:02:43 +0000

When I decided to build WireRoom, I had two goals: learn how real deployment works end to end, and understand WebSockets beyond the "it's like HTTP but persistent" explanation everyone gives.
Here's what I actually ran into.

What WireRoom does
Users sign in — via Google, GitHub, or a plain username/password — and get dropped into shareable rooms with short alphanumeric codes. The first person in becomes the host, can kick participants, and can transfer host privileges. Messages are real-time. The whole thing runs on Go + WebSockets in production.

Why Go for a chat server?
Go's concurrency model is a natural fit for WebSocket servers. Each connection is a long-lived, stateful thing — you need to read from it, write to it, and track it. Goroutines make this straightforward: spin one up per client, let the runtime handle scheduling. Compare this to Node.js where you're managing an event loop and callback chains the moment things get complex.
I used Gorilla WebSocket — the de facto Go library for this. It wraps the upgrade handshake cleanly and gives you a Conn type you can read and write on directly.

The architecture: goroutine per client
Every client gets two goroutines — one for reading, one for writing. The reader blocks on conn.ReadMessage() and forwards messages to a central hub. The writer blocks on a channel and flushes messages out as they arrive.
gofunc (c *Client) readPump() {
defer func() {
c.hub.unregister <- c
c.conn.Close()
}()
for {
_, message, err := c.conn.ReadMessage()
if err != nil {
break
}
c.hub.broadcast <- message
}
}
The hub is a single goroutine that owns all shared state — the client map, room assignments, host tracking, everything. All mutations go through it via channels. This is the design decision that kept the code race-free without fighting mutexes everywhere.

The host system
This was the most interesting backend problem. WireRoom has room ownership — first user in becomes host, with the ability to kick others or transfer the crown. That means the state isn't just "who's connected" but "who owns this room" and "what can they do."
The hub handles this entirely. A kick event isn't just closing a connection — it's updating the host map, broadcasting a system message to the room, and gracefully closing the target client's goroutines in the right order.

System events like joins, leaves, and host transfers get broadcast to the room as distinct message types so the frontend can render them differently from chat messages.

Auth: OAuth plus passwords
I implemented two auth paths — OAuth via Google and GitHub, and a plain username/password fallback. The OAuth flow uses a state token for CSRF protection: generate a random token, store it in a cookie, send it to the provider, verify it matches on the redirect back. Without this, an attacker can trick a user into completing an OAuth flow the attacker initiated, linking the wrong account.
The password path stores credentials against Supabase PostgreSQL and validates on login. Both paths converge at the same session — once you're in, the WebSocket layer doesn't care how you authenticated.

Deployment: Railway over Render
Render's free tier spins down services after inactivity. For a WebSocket server that's a dealbreaker — the first user after idle gets a cold start, and persistent connections can't survive a process restart. Railway keeps services alive and deploys straight from GitHub. Supabase handled PostgreSQL with connection pooling, so I didn't have to think about database connections under concurrent load.

What I'd add next
Emoji reactions. The infrastructure handles it — adding a new message type to the hub is trivial. It's just not shipped yet.

What building this taught me
The Go race detector (go test -race) is not optional. Use it from day one. The goroutine-per-client model with a central hub absorbed everything I threw at it — dozens of concurrent connections, rapid room switching, abrupt disconnects, host transfers mid-session. None of it caused issues once the hub ownership model was right.
The full source is on GitHub.

DEV Community: amogh tyagi

Rate Limiting at Scale: Building Fixed Window and Token Bucket in Go

Building a Production WebSocket Chat Server in Go — What I Learned