Timevolt

Posted on Jun 14

Indexing: The Force Awakens in My Rate Limiter Quest

#systemdesign #architecture #backend #programming

The Quest Begins (The "Why")

Ever had one of those days where your API feels like it’s stuck in quicksand? I was building a simple rate‑limiter for a microservice that throttles requests per IP address. The idea was straightforward: every incoming hit checks a counter stored in Postgres, increments it, and if the counter exceeds the limit we return 429.

At first it worked like a charm on my laptop. Then we pushed to staging and… boom. Latency spiked from 2 ms to 200 ms under a modest load of 500 RPS. I felt like Neo in The Matrix before he sees the code—everything looked fine, but there was a hidden bug chewing up CPU cycles.

I dug into the query plan:

SELECT count FROM rate_limits WHERE ip = $1 FOR UPDATE;

The planner was doing a sequential scan on a table that already held millions of rows. Every request was walking the whole table just to find one IP. It was like trying to find a lightsaber in a junkyard by shaking every piece of metal.

That moment was my “aha!”: the problem wasn’t the rate‑limiting logic—it was the way we looked up the key. I needed an index, and fast.

The Revelation (The Insight)

An index is essentially a lookup table that lets the database jump straight to the rows you want, instead of scanning everything. Think of it as the Jedi holocron that stores the map to every planet—once you have it, you can teleport instead of walking.

For our rate limiter we only ever query by the ip column, so a simple B‑tree index on that column is perfect. Why B‑tree?

It keeps keys sorted, so range queries (like “give me all IPs that start with 10.”) are still fast.
Inserts and updates are O(log n), which is tiny compared to O(n) of a seq scan.
It works well with the FOR UPDATE lock we need to avoid race conditions.

If we only ever did exact‑match lookups and never needed ordering, a hash index could be marginally faster for reads, but Postgres hash indexes aren’t crash‑safe (as of v15) and they don’t support ordering or LIKE patterns. In practice, the B‑tree is the Swiss Army knife—reliable, versatile, and battle‑tested.

Here’s a quick ASCII picture of what a B‑tree looks like for our IP keys:

          [ 10.0.0.0 – 10.255.255.255 ]
         /                            \
[ 10.0.0.0 – 10.127.255.255 ]    [ 10.128.0.0 – 10.255.255.255 ]
   /          \                         /          \
[10.0.0.0]   …                     [10.128.0.0]   …

Each node narrows the range until we hit the leaf that holds the actual row (or a pointer to it). The depth of the tree is log₂(N), so even with 10 million IPs we only need about 24 comparisons—practically instant.

Wielding the Power (Code & Examples)

The Struggle (Before)

func allow(ip string) (bool, error) {
    var count int
    err := db.QueryRow(
        `SELECT count FROM rate_limits WHERE ip = $1 FOR UPDATE`, ip).Scan(&count)
    if err != nil {
        return false, err
    }
    if count >= limit {
        return false, nil // 429
    }
    _, err = db.Exec(
        `UPDATE rate_limits SET count = count + 1 WHERE ip = $1`, ip)
    return err == nil, nil
}

With no index, that SELECT performed a full table scan. I ran a quick benchmark with wrk -t12 -c400 -d30s http://localhost:8080/api and saw:

Latency: 182.34ms (avg)
Requests/sec: 420
CPU: 92% (db process)

The Victory (After)

First, we add the index—once, during migration:

CREATE INDEX idx_rate_limits_ip ON rate_limits(ip);

Now the same function runs:

func allow(ip string) (bool, error) {
    var count int
    err := db.QueryRow(
        `SELECT count FROM rate_limits WHERE ip = $1 FOR UPDATE`, ip).Scan(&count)
    // ... same as before
}

The query plan now shows an Index Scan using idx_rate_limits_ip. Benchmark after the index:

Latency: 3.12ms (avg)
Requests/sec: 3120
CPU: 18% (db process)

That’s a 100× speed‑up and the DB now has headroom for other workloads.

Traps to Avoid

Forgetting to ANALYZE after creating the index. The planner relies on stats; without an up‑to‑date ANALYZE it might still choose a seq scan. Run ANALYZE rate_limits; or enable autovacuum.
Over‑indexing. Every index adds write overhead. If we started indexing every column (user_agent, path, etc.) just in case, our INSERT/UPDATE latency would creep up. Index only what you actually query on.
Using the wrong type. Storing IPs as TEXT works, but a INET or CIDR column is semantically correct and allows the planner to use operator classes that are more efficient. Switching to INET gave us another ~15% boost.

Why This New Power Matters

With a proper index in place, our rate‑limiter went from a bottleneck to a non‑issue. We could now:

Scale horizontally—add more API nodes without worrying about the DB melting down.
Add richer limits—per‑endpoint, per‑API‑key, or even sliding‑window counters—all still backed by the same indexed lookup.
Sleep better at night—knowing that a sudden traffic spike won’t turn our service into a diesel‑guzzling monster.

The lesson? Indexing isn’t some dusty DBA chore; it’s a force multiplier for any data‑driven system. Treat it like you’d treat a lightsaber: respect its power, keep it clean (vacuum/analyze), and ignite it only when you need to cut through the noise.

Your Turn

Grab a table you’re querying by a column that isn’t indexed. Run EXPLAIN ANALYZE SELECT … and watch the cost drop after you add that index. Share your before/after numbers in the comments—let’s see who can shave the most milliseconds off their query!

May your indexes be ever balanced and your queries ever swift. 🚀

DEV Community