DEV Community

Cover image for A 1000x Speedup From One Index — and Why It Sometimes Does Nothing
Dane Wu
Dane Wu

Posted on

A 1000x Speedup From One Index — and Why It Sometimes Does Nothing

Rails Performance: Lessons from Production — #2

"Slow query? Add an index" is something everyone has heard. But I once hit a more embarrassing situation: I added the index, and the query didn't get any faster. Debugging that forced me to actually understand how indexes work — when they're lightning fast, and when they simply won't be used. This post walks through it with one example: a shipments table with a few million rows.


🐌 I added an index and the query was still slow

The shipments table has 3 million rows. The front end looks up one row by tracking number: Shipment.where(tracking_no: "ABC123") — 12 seconds slow. I added an index and it dropped to 8 milliseconds. One line of CREATE INDEX, a thousand times faster. Felt great.

Until another query, where I added an index the same way, and EXPLAIN still showed a full table scan — no faster at all. That's when it clicked: an index isn't magic that makes things fast just by existing. It has its own rules — and if you don't understand them, adding one does nothing.


📖 Why an index is fast: start from the full table scan

Without an index, the DB can only find a row by scanning from the first row to the last, comparing tracking_no one by one. With 3 million rows that's 1.5 million on average — like a book with no index, where finding a word means flipping through the whole thing.

SELECT * FROM shipments WHERE tracking_no = 'ABC123';
-- No index: Seq Scan, scans all 3 million rows
Enter fullscreen mode Exit fullscreen mode

With an index, it's like the index page at the back of a book: it keeps tracking_no pre-sorted in a separate copy, each value noting "where this row lives in the main table." The DB doesn't flip through the whole book — it locates the value directly in the sorted index, then jumps to that row.

CREATE INDEX index_shipments_on_tracking_no ON shipments (tracking_no);
Enter fullscreen mode Exit fullscreen mode

One sentence: an index is fast because it's pre-sorted, so the DB can use "fast lookup on sorted data" instead of "scan every row."


🌳 Under the hood it's a B-tree (not a binary tree)

Common confusion: an index's B-tree is not a binary tree. A binary tree has at most 2 branches per node; a B-tree has hundreds of branches per node — a wide, shallow tree.

Why wide and shallow? Because reading from disk is slow. Every level the DB descends is one more disk read.

  • Binary tree: 3 million rows is ~21 levels deep → 21 disk reads.
  • B-tree: each level fans out hundreds of times, so 3 million rows is only 3–4 levels → 3–4 disk reads.

So the whole point of a B-tree is to minimize the number of disk reads. That's the physical reason an index is fast.


💰 Indexes aren't free, so you can't index every column

An index is "an extra sorted copy of the data." Two costs:

Writes get slower — every INSERT / UPDATE / DELETE doesn't just change the main table, it has to maintain every index (inserting the new value into the right spot in each B-tree). With 5 indexes, each write touches 6 places. For a table like shipments that constantly takes new orders, that cost is very real.

Space — an index really does store an extra copy on disk.

So an index is fundamentally a trade-off: slower writes and more storage in exchange for faster reads. The rule of thumb: only index a column that's actually queried (where / order / join). Read-heavy, write-light columns pay off the most; on write-heavy tables, be sparing.


🎯 So when do you actually add one?

Indexes come from two places:

1. The ones you add by design (predictable)

  • Foreign keys: shipments.courier_id — you'll definitely where(courier_id:) and join on it.
  • Unique lookup columns: tracking_no, email, slug — the ones you use to pinpoint a single row.
  • Obviously high-frequency filter/sort columns: ones you know at design time will be queried constantly.

Rails note: in a migration, t.references / add_reference builds an index on the foreign-key column by default; foreign_key: true adds a data-integrity constraint, which is a separate thing from an index — don't conflate them.

2. The ones you add after launch, driven by data (the important part)

Don't speculatively add the rest — you can't guess which will be slow. The real workflow is "measure first, then index where needed":

  1. Find the slow query from APM or the DB's slow query log.
  2. Use EXPLAIN to confirm it's actually doing a full table scan (Seq Scan in Postgres, type: ALL in MySQL).
  3. Add an index on that column, then EXPLAIN again to confirm it became an Index Scan.
  4. Confirm it's actually faster.

One thing often overlooked: small tables don't need indexes. A few thousand rows scans in milliseconds anyway; an index just adds write cost. The value of an index only shows up once the table is big enough.

In short: add the few predictable ones by design; for the rest, index against actual slow queries and verify with EXPLAIN; leave small tables alone.


🔑 Composite indexes and the leftmost prefix

Real queries are often multi-condition: where(courier_id: 1, status: "delivered"). For that, a single composite index can cover multiple columns:

CREATE INDEX idx ON shipments (courier_id, status, created_at);
Enter fullscreen mode Exit fullscreen mode

The key is how it's sorted: first by courier_id, then by status within the same courier_id, then by created_at within the same status. Just like a phone book — sorted by last name, then first name within a last name.

The leftmost-prefix rule: because it sorts starting from the left column, you can only use it contiguously from the left.

Query Uses the index?
where(courier_id: 1)
where(courier_id: 1, status: "x")
where(status: "delivered") ❌ skips the leftmost courier_id

Why doesn't where(status:) use it? The phone book is sorted by last name first. If you only know the first name "John" but not the last name, the Johns are scattered across every last name — not grouped by first name — so you still flip through the whole book. status is scattered in the index the same way.

So column order matters. Two principles:

  1. Put the column most often queried alone on the left (so it works both alone and combined).
  2. Equality (=) before range (>, <). A range "truncates" the index — created_at > x spans a big stretch across many different created_at values, and within that stretch the later columns are out of order, so the index can't continue.

⚠️ Why an index sometimes doesn't kick in (back to the opening trap)

EXPLAIN still shows a full table scan even though you added the index — usually one of these, all sharing one cause: they defeat the index's sort order.

1. Applying a function or operation to the column

where("DATE(created_at) = ?", "2026-06-25")          # ❌ DB must compute DATE() per row to compare
where(created_at: Date.parse("2026-06-25").all_day)  # ✅ use a range instead — touch the value, not the column
Enter fullscreen mode Exit fullscreen mode

The index sorts by the column's raw value; the moment you compute on the column, that sort order is useless.

2. LIKE '%xxx' with a leading wildcard

where("tracking_no LIKE ?", "ABC%")   # ✅ knows the prefix, can locate
where("tracking_no LIKE ?", "%123")   # ❌ no prefix, scans everything
Enter fullscreen mode Exit fullscreen mode

3. Type mismatchtracking_no is a string column but you pass a number:

where(tracking_no: 123)    # ❌ DB casts every row's column to a number to compare = operating on the column → index defeated
where(tracking_no: "123")  # ✅ types match, uses the index directly
Enter fullscreen mode Exit fullscreen mode

4. The condition matches too large a fraction — if a condition hits a big chunk of the table, the DB deliberately skips the index, because "locate + repeatedly jump back to the main table" ends up slower than just scanning once:

where(status: "active")    # ❌ if 90% of shipments are active → DB prefers a full scan
where(status: "returned")  # ✅ only 1%, a small slice → the index pays off
Enter fullscreen mode Exit fullscreen mode

Indexes are for grabbing a small slice, not a big chunk. So a column like status with only a few, unevenly distributed values often can't use an index.

5. OR across different columnscourier_id and tracking_no each have an index, but a single OR query struggles to use both at once and often degrades to a full scan. Splitting into two queries, each using its own index, then merging, is faster:

# ❌ one OR, hard to use both indexes
Shipment.where("courier_id = ? OR tracking_no = ?", 1, "ABC123")

# ✅ split into two, each uses its own index, then merge + dedupe
a = Shipment.where(courier_id: 1)
b = Shipment.where(tracking_no: "ABC123")
(a + b).uniq
Enter fullscreen mode Exit fullscreen mode
-- the SQL equivalent: UNION two queries that each use their index
SELECT * FROM shipments WHERE courier_id = 1
UNION
SELECT * FROM shipments WHERE tracking_no = 'ABC123';
Enter fullscreen mode Exit fullscreen mode

The shared principle: an index works off the sort order of the column's raw value. Anything that makes the DB "compute/cast the column per row" or "lose the ability to locate by prefix" defeats it. So — don't touch the column, touch the value side instead.


🏁 Wrap-up

An index isn't a switch that makes things fast just by flipping it. It's fast because it's pre-sorted and the B-tree minimizes disk reads; it has a cost, so you only index columns that get queried; and it has rules — a composite index is used contiguously from the left, and operating on a column defeats it.

That opening query that "had an index but wasn't faster" turned out to have a function wrapped around the column. The question was never "should I add an index" — it's whether you understand how it sorts and when it gets used. Next time a query is slow, don't rush to CREATE INDEX — run EXPLAIN first and see what it's actually doing. The answer is usually right there.


🧩 Appendix: (a + b).uniq and UNION / DISTINCT

About that (a + b).uniq from the OR-split above, in three parts:

a = Shipment.where(courier_id: 1)          # query 1
b = Shipment.where(tracking_no: "ABC123")  # query 2
(a + b).uniq
Enter fullscreen mode Exit fullscreen mode

① What a + b does — runs both queries, fetches each, and concatenates them into one array (not one combined SQL statement).

② What .uniq does — dedupes. A shipment might satisfy both conditions and appear on both sides, so concatenating creates duplicates; .uniq folds them into one.

③ Why + hits the DB — because a and b aren't arrays yet; they're "not-yet-executed queries" (ActiveRecord::Relation), lazy. (Back to the four layers from #1: .where only accumulates conditions, it hasn't touched the DB.) + is an Array method — it needs arrays, not relations — so it forces both to execute and become arrays before adding:

a + b
#  → a.to_a fires SELECT ... WHERE courier_id = 1
#  → b.to_a fires SELECT ... WHERE tracking_no = 'ABC123'
#  → then concatenates the two arrays
Enter fullscreen mode Exit fullscreen mode

So hitting the DB isn't done by + — it's a side effect of + forcing the relations to execute so it can build the array.

The whole (a + b).uniq is just simulating SQL's UNION in Ruby: two separate queries, merged, deduped.

Aside: UNION vs DISTINCT — both dedupe; the difference is "whose duplicates":

What it does
DISTINCT dedupes within one result set (one query)
UNION merges two queries + dedupes
UNION ALL merges two queries but does not dedupe (faster)

Top comments (0)