Esteban Alvarez

Posted on Oct 18

Stop writing every request to your database: a tiny pattern that potentially saves 95–99% of writes

#go #ratelimiter

If you’ve ever implemented rate limiting, quotas, or high‑churn counters, you’ve probably felt the pain of per‑request writes to Redis/DB. It’s simple, but it gets slow and expensive fast.

Here’s a small idea that fixes that: keep two numbers per key in memory — a stable scalar (what’s already persisted) and a volatile vector (net changes not yet persisted). Make decisions from both, and only persist when the vector crosses a threshold or at shutdown.

I call this the Vector–Scalar Accumulator (VSA). It’s simple, fast, and easy to drop into real services.

The motivation (in one minute)

Most updates cancel out quickly (add/remove, like/unlike, reserve/release). Persisting every micro‑event wastes I/O.
With VSA, you:
- Decide admits/denies in nanoseconds from in‑memory state.
- Defer and batch persistence (e.g., every 50 net updates).
- Flush leftovers on graceful shutdown.

Result: thousands of requests turn into a handful of writes — typically a 95–99% reduction — without changing the correctness of the decision path.

The mental model

scalar (S): the durable base (e.g., 1000 requests allowed).
vector (V): in‑memory net usage since the last commit.
Availability: Available = S - |V|.
When |V| >= threshold, persist the net and apply Commit(V) which preserves availability but resets V to zero.

Architecture at a glance

flowchart LR
  Client --> API[/HTTP /check?api_key=.../]
  API --> Store[Per-key VSA Store]
  Store -->|TryConsume(1)| API
  Store --> Worker
  Worker -->|commitLoop (threshold)| Persister
  Worker -->|evictionLoop| Persister
  Worker -->|final flush on Stop| Persister
  Persister --> DB[(Durable sink)]

API path is zero‑hop: no network I/O per request.
Background worker does the slow stuff (batch commits, eviction, final flush).

Request timeline (what actually happens)

sequenceDiagram
  participant U as User
  participant A as API
  participant V as VSA (S,V)
  participant W as Worker
  participant P as Persister

  U->>A: GET /check?api_key=alice
  A->>V: TryConsume(1)
  V-->>A: ok (remaining = S - |V|)
  A-->>U: 200 OK + X-RateLimit-Remaining

  Note over W: every commitInterval
  W->>V: CheckCommit(threshold)
  alt |V| >= threshold
    W->>P: CommitBatch(key, vector)
    P-->>W: ok
    W->>V: Commit(vector) // preserves availability
  end

  Note over A,W,P: On shutdown: Worker runs final flush for any non-zero V

Minimal code you can reason about

1) Atomic, fair admission (no last‑token race)

// Given: v := vsa.New(1000) // S=1000
if !v.TryConsume(1) {
    // Deny (429): no tokens left
} else {
    // Allow (200): remaining = v.Available()
}

TryConsume(1) atomically checks Available and increments the in‑memory vector. Two concurrent calls cannot both grab the last token.

2) A tiny HTTP handler

func (s *Server) handleCheckRateLimit(w http.ResponseWriter, r *http.Request) {
    key := r.URL.Query().Get("api_key")
    if key == "" { http.Error(w, "API key is required", 400); return }

    userVSA := s.store.GetOrCreate(key)
    if !userVSA.TryConsume(1) {
        w.Header().Set("X-RateLimit-Status", "Exceeded")
        w.Header().Set("Retry-After", "60")
        http.Error(w, "Too Many Requests", 429)
        return
    }

    w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", s.rateLimit))
    w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", userVSA.Available()))
    w.Header().Set("X-RateLimit-Status", "OK")
    w.WriteHeader(200)
    fmt.Fprint(w, "OK")
}

3) The worker that saves your I/O budget

func (w *Worker) commitLoop() {
    ticker := time.NewTicker(w.commitInterval)
    defer ticker.Stop()
    for {
        select {
        case <-ticker.C:
            w.runCommitCycle() // commit any key with |vector| >= threshold
        case <-w.stopChan:
            w.runFinalFlush()  // commit any non-zero vector (remainders)
            return
        }
    }
}

Inside runCommitCycle():

shouldCommit, vec := vsa.CheckCommit(threshold)
if shouldCommit {
    persister.CommitBatch([]Commit{{Key: key, Vector: vec}})
    vsa.Commit(vec) // S := S - vec; V := V - vec (preserves availability)
}

What the logs look like (realistic demo)

During load:

[2025-10-17T12:00:01-06:00] Persisting batch of 1 commits...
  - KEY: alice-key            VECTOR: 50
[2025-10-17T12:00:02-06:00] Persisting batch of 1 commits...
  - KEY: alice-key            VECTOR: 51

On shutdown (graceful final flush):

Shutting down server...
Stopping background worker...
[2025-10-17T18:23:22-06:00] Persisting batch of 2 commits...
  - KEY: alice-key            VECTOR: 43
  - KEY: bob-key              VECTOR: 1
Server gracefully stopped.

Note: bob-key didn’t reach the threshold during runtime, so it shows up in the final flush.

Why this is fast and fair

Zero per‑request network I/O: decisions are in memory only.
Admission‑invariant commits: moving vector → scalar doesn’t change Available, so you don’t get oscillation bugs near batch boundaries.
Atomic last‑token: TryConsume prevents double‑spend of the final unit under high concurrency.

Typical outcome with commitThreshold = 50:

1001 requests → about 20 threshold commits during runtime (or a single final batch on shutdown).
That’s ~98% fewer writes compared to “write every request”.

How to try it (end‑to‑end quick test)

Find the Project Repo at https://github.com/etalazz/vsa

Start the server:

go run ./cmd/ratelimiter-api/main.go

Drive traffic with the script (Bash/Git Bash/WSL):

./scripts/test_ratelimiter.sh

What you’ll see:

Client hits /check until Alice gets a 429 on the 1001st request.
Server prints periodic batched commits (VECTOR: 50/51).
On Ctrl+C, a final flush (e.g., bob-key: 1).

Prefer manual poking?

curl -i "http://localhost:8080/check?api_key=alice-key"

Trade‑offs (and easy mitigations)

Crash before flush can lose up to commitThreshold per key.
- Mitigate by lowering the threshold, shortening the interval, or pairing with a write‑ahead log (Kafka/Redis Streams) for critical keys.
Single node by default.
- For strict global limits across many nodes, add token leasing: lease chunks from a central remaining counter and serve locally from VSA. You keep the zero‑hop hot path; coordination happens only on lease boundaries.

When to use this

Rate limits, quotas, usage metering
Like/unlike, view counters, telemetry aggregation
Reservations: cart holds, connection pools, job slots

Any place with commutative deltas and lots of short‑lived churn.

Wrap‑up

The VSA pattern is tiny but powerful: keep two numbers per key, decide from both, and write only when it matters. You’ll cut write amplification dramatically, keep tail latency predictable, and preserve fairness under load.

If you build backends in Go, this is one of those practicality‑wins patterns you can implement in an afternoon — and reap benefits for years.

DEV Community