If you’ve ever implemented rate limiting, quotas, or high‑churn counters, you’ve probably felt the pain of per‑request writes to Redis/DB. It’s simple, but it gets slow and expensive fast.
Here’s a small idea that fixes that: keep two numbers per key in memory — a stable scalar
(what’s already persisted) and a volatile vector
(net changes not yet persisted). Make decisions from both, and only persist when the vector crosses a threshold or at shutdown.
I call this the Vector–Scalar Accumulator (VSA). It’s simple, fast, and easy to drop into real services.
The motivation (in one minute)
- Most updates cancel out quickly (add/remove, like/unlike, reserve/release). Persisting every micro‑event wastes I/O.
- With VSA, you:
- Decide admits/denies in nanoseconds from in‑memory state.
- Defer and batch persistence (e.g., every 50 net updates).
- Flush leftovers on graceful shutdown.
Result: thousands of requests turn into a handful of writes — typically a 95–99% reduction — without changing the correctness of the decision path.
The mental model
-
scalar (S)
: the durable base (e.g., 1000 requests allowed). -
vector (V)
: in‑memory net usage since the last commit. - Availability:
Available = S - |V|
. - When
|V| >= threshold
, persist the net and applyCommit(V)
which preserves availability but resetsV
to zero.
Architecture at a glance
flowchart LR
Client --> API[/HTTP /check?api_key=.../]
API --> Store[Per-key VSA Store]
Store -->|TryConsume(1)| API
Store --> Worker
Worker -->|commitLoop (threshold)| Persister
Worker -->|evictionLoop| Persister
Worker -->|final flush on Stop| Persister
Persister --> DB[(Durable sink)]
- API path is zero‑hop: no network I/O per request.
- Background worker does the slow stuff (batch commits, eviction, final flush).
Request timeline (what actually happens)
sequenceDiagram
participant U as User
participant A as API
participant V as VSA (S,V)
participant W as Worker
participant P as Persister
U->>A: GET /check?api_key=alice
A->>V: TryConsume(1)
V-->>A: ok (remaining = S - |V|)
A-->>U: 200 OK + X-RateLimit-Remaining
Note over W: every commitInterval
W->>V: CheckCommit(threshold)
alt |V| >= threshold
W->>P: CommitBatch(key, vector)
P-->>W: ok
W->>V: Commit(vector) // preserves availability
end
Note over A,W,P: On shutdown: Worker runs final flush for any non-zero V
Minimal code you can reason about
1) Atomic, fair admission (no last‑token race)
// Given: v := vsa.New(1000) // S=1000
if !v.TryConsume(1) {
// Deny (429): no tokens left
} else {
// Allow (200): remaining = v.Available()
}
TryConsume(1)
atomically checks Available
and increments the in‑memory vector
. Two concurrent calls cannot both grab the last token.
2) A tiny HTTP handler
func (s *Server) handleCheckRateLimit(w http.ResponseWriter, r *http.Request) {
key := r.URL.Query().Get("api_key")
if key == "" { http.Error(w, "API key is required", 400); return }
userVSA := s.store.GetOrCreate(key)
if !userVSA.TryConsume(1) {
w.Header().Set("X-RateLimit-Status", "Exceeded")
w.Header().Set("Retry-After", "60")
http.Error(w, "Too Many Requests", 429)
return
}
w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", s.rateLimit))
w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", userVSA.Available()))
w.Header().Set("X-RateLimit-Status", "OK")
w.WriteHeader(200)
fmt.Fprint(w, "OK")
}
3) The worker that saves your I/O budget
func (w *Worker) commitLoop() {
ticker := time.NewTicker(w.commitInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
w.runCommitCycle() // commit any key with |vector| >= threshold
case <-w.stopChan:
w.runFinalFlush() // commit any non-zero vector (remainders)
return
}
}
}
Inside runCommitCycle()
:
shouldCommit, vec := vsa.CheckCommit(threshold)
if shouldCommit {
persister.CommitBatch([]Commit{{Key: key, Vector: vec}})
vsa.Commit(vec) // S := S - vec; V := V - vec (preserves availability)
}
What the logs look like (realistic demo)
During load:
[2025-10-17T12:00:01-06:00] Persisting batch of 1 commits...
- KEY: alice-key VECTOR: 50
[2025-10-17T12:00:02-06:00] Persisting batch of 1 commits...
- KEY: alice-key VECTOR: 51
On shutdown (graceful final flush):
Shutting down server...
Stopping background worker...
[2025-10-17T18:23:22-06:00] Persisting batch of 2 commits...
- KEY: alice-key VECTOR: 43
- KEY: bob-key VECTOR: 1
Server gracefully stopped.
Note: bob-key
didn’t reach the threshold during runtime, so it shows up in the final flush.
Why this is fast and fair
- Zero per‑request network I/O: decisions are in memory only.
- Admission‑invariant commits: moving
vector → scalar
doesn’t changeAvailable
, so you don’t get oscillation bugs near batch boundaries. - Atomic last‑token:
TryConsume
prevents double‑spend of the final unit under high concurrency.
Typical outcome with commitThreshold = 50
:
- 1001 requests → about 20 threshold commits during runtime (or a single final batch on shutdown).
- That’s ~98% fewer writes compared to “write every request”.
How to try it (end‑to‑end quick test)
Find the Project Repo at https://github.com/etalazz/vsa
Start the server:
go run ./cmd/ratelimiter-api/main.go
Drive traffic with the script (Bash/Git Bash/WSL):
./scripts/test_ratelimiter.sh
What you’ll see:
- Client hits
/check
until Alice gets a429
on the 1001st request. - Server prints periodic batched commits (
VECTOR: 50/51
). - On Ctrl+C, a final flush (e.g.,
bob-key: 1
).
Prefer manual poking?
curl -i "http://localhost:8080/check?api_key=alice-key"
Trade‑offs (and easy mitigations)
- Crash before flush can lose up to
commitThreshold
per key.- Mitigate by lowering the threshold, shortening the interval, or pairing with a write‑ahead log (Kafka/Redis Streams) for critical keys.
- Single node by default.
- For strict global limits across many nodes, add token leasing: lease chunks from a central
remaining
counter and serve locally from VSA. You keep the zero‑hop hot path; coordination happens only on lease boundaries.
- For strict global limits across many nodes, add token leasing: lease chunks from a central
When to use this
- Rate limits, quotas, usage metering
- Like/unlike, view counters, telemetry aggregation
- Reservations: cart holds, connection pools, job slots
Any place with commutative deltas and lots of short‑lived churn.
Wrap‑up
The VSA pattern is tiny but powerful: keep two numbers per key, decide from both, and write only when it matters. You’ll cut write amplification dramatically, keep tail latency predictable, and preserve fairness under load.
If you build backends in Go, this is one of those practicality‑wins patterns you can implement in an afternoon — and reap benefits for years.
Top comments (0)