Timevolt

Posted on Jul 3

Indexing Like a Jedi: Using the Force to Speed Up Your Queries

#systemdesign #architecture #backend #programming

The Quest Begins (The "Why")

Honestly, I was staring at a dashboard that looked like a scene from The Empire Strikes Back — the rebel alliance (our tiny startup) was getting crushed by the Imperial fleet (our users) every time they hit the “search” button. Our API would choke, latency would spike past 2 seconds, and the ops team started whispering about “scaling the DB”.

We had a simple PostgreSQL table called events that stored every click, view, and purchase. The query that powered our rate‑limiter looked like this:

SELECT COUNT(*) 
FROM events 
WHERE user_id = $1 
  AND occurred_at >= NOW() - INTERVAL '1 minute';

It seemed harmless — just count rows for a user in the last minute. But as traffic grew, the table hit 10 million rows, and the planner chose a sequential scan. Each request walked through the whole table, burning CPU and pushing our RDS instance into the red zone.

I spent three hours reproducing the problem locally, watching EXPLAIN ANALYZE scream “Seq Scan on events (cost=0.00..125000.00 rows=10000000 width=4)” and feeling like Luke stuck in the trash compactor. Something had to change — otherwise we’d keep losing users to timeout errors.

The Revelation (The Insight)

Here’s the thing: indexes aren’t magic fairy dust; they’re just sorted shortcuts. When the planner can locate rows by using an index, it avoids reading every block on disk. The critical insight for our rate‑limiter was simple: we needed an index that matched the exact shape of our WHERE clause — specifically, a composite index on (user_id, occurred_at).

Why composite? Because the query filters on two columns together. An index on just user_id would still have to scan all rows for that user to find the recent ones, and an index on just occurred_at would have to scan all recent rows across all users. The combo lets PostgreSQL jump straight to the slice of the index that belongs to a given user and falls inside the time window.

Think of the index as a lightsaber that lets you cut through the data forest in a single swing, rather than swinging a blunt axe at every tree.

ASCII view of a B‑tree index (simplified)

                [ (user_id, occurred_at) ]
                           |
   +-----------------------+-----------------------+
   |                                   |
[(1, t1)]                     [(1, t100)]          ← leaf nodes (sorted)
   |                                   |
 +---+---+                       +---+---+
 |   |   |                       |   |   |
(1,t2) (1,t3) (1,t4)            (1,t98) (1,t99) (1,t101)

Each leaf holds the actual row pointers (or the row itself if it’s a covering index). The tree stays balanced, so look‑ups are O(log N) instead of O(N).

Wielding the Power (Code & Examples)

The Struggle (Before)

-- No index → Seq Scan, slow as a snail in molasses
EXPLAIN ANALYZE
SELECT COUNT(*) 
FROM events 
WHERE user_id = 42 
  AND occurred_at >= NOW() - INTERVAL '1 minute';

Output (truncated):

Aggregate  (cost=125000.00..125000.01 rows=1 width=8)
  ->  Seq Scan on events  (cost=0.00..125000.00 rows=5000000 width=4)
        Filter: ((user_id = 42) AND (occurred_at >= now() - '00:01:00'::interval))

The Victory (After)

We added the composite index:

CREATE INDEX idx_events_user_occurred 
ON events (user_id, occurred_at);

Now the same query:

EXPLAIN ANALYZE
SELECT COUNT(*) 
FROM events 
WHERE user_id = 42 
  AND occurred_at >= NOW() - INTERVAL '1 minute';

Output:

Aggregate  (cost=8.42..8.43 rows=1 width=8)
  ->  Index Scan using idx_events_user_occurred on events  (cost=0.42..8.42 rows=5 width=4)
        Index Cond: ((user_id = 42) AND (occurred_at >= now() - '00:01:00'::interval))

Boom! The planner now uses an Index Scan, touching only a handful of index pages (~5 rows) instead of millions of table rows. Latency dropped from ~2 seconds to under 5 ms on the same hardware.

Common Traps (The “Boss Levels” to Avoid)

Index on the wrong column order

   CREATE INDEX idx_events_occurred_user ON events (occurred_at, user_id);

With this order, the planner can still use the index for the time range, but it must then filter user_id after scanning many rows — still expensive. The rule: put equality columns first (user_id), then range columns (occurred_at).

Forgetting to keep the index small

If you add a bunch of unnecessary columns to the index (e.g., INCLUDE (payload)), the index becomes larger, slowing inserts and consuming more memory. Only include what you need for the query.
Over‑indexing

Every index speeds reads but slows writes. In our rate‑limiter, writes are heavy (each event inserts a row). We benchmarked and found that adding a second index on (occurred_at) increased write latency by ~18 % with negligible read benefit, so we skipped it.

Why This New Power Matters

Now that we’ve wielded the indexing lightsaber, our rate‑limiter can handle bursts of 10 k requests per second without breaking a sweat. The DB CPU usage hovered at 30 % instead of pegging at 95 %, and our autoscaling group stopped launching new nodes every five minutes.

Beyond the limiter, this insight rippled through the whole product:

Analytics queries that filtered by user_id and a date range now finish in milliseconds.
Batch jobs that purge old events can quickly locate the cutoff point using the same index.
Feature flags that gate functionality per user benefit from lightning‑fast look‑ups.

In short, understanding how an index mirrors your query’s WHERE clause turned a scaling nightmare into a smooth, predictable system. It’s the difference between trying to find a needle in a haystack with your bare hands versus using a magnet — except the magnet is a B‑tree, and the needle is your user’s recent activity.

Your Turn

Grab a table that’s been giving you grief, run EXPLAIN ANALYZE on its most painful query, and ask yourself: Which columns are equality checks? Which are ranges? Build a composite index that matches that order, test it, and watch the query plan flip from a Seq Scan to an Index Scan.

What’s the slowest query in your app right now? Try the index trick and drop a comment with the before/after numbers — let’s celebrate those wins together! 🚀

DEV Community