Part of the "SQL: Zero to Ninja" series, written for junior web devs.
Your app worked great with 200 users. Then it got popular, hit a million rows,
and one page started taking 8 whole seconds to load. Same query, same code,
nothing changed. So what happened? Your database was quietly reading every
single row, every single time. Let's fix that with indexes, and then squash the
sneaky N+1 bug that haunts almost every ORM app.
The idea in one line
An index is a sorted shortcut that lets the database jump straight to the
rows you want, instead of reading the whole table to find them.
The metaphor: a dictionary
Imagine you want the word "ninja" in a paper dictionary.
You do not start at page 1 and read every word until you hit "ninja". That
would take all day. Instead, you flip to the "N" section, then narrow down fast.
The dictionary is sorted, so you jump close and zoom in.
Now imagine a dictionary with the words in random order. To find "ninja" you
would have to read it cover to cover. Painful.
That is the whole story:
No index --> read the dictionary cover to cover (full table scan)
With index --> jump to "N", then zoom in (sorted lookup)
A table with no index is the random-order dictionary. An index puts things in
order so the database can flip right to the page.
Why a query gets slow
When you run this:
SELECT * FROM orders WHERE user_id = 42;
If there is no index on user_id, the database does a full table scan. It
literally checks every row: "is this user 42? no. is this one? no..." all the
way down.
Row 1 user_id=7 nope
Row 2 user_id=3 nope
Row 3 user_id=42 yes! keep it
...
Row 9,999,999 (still reading)
With 100 rows, that is instant. With 10 million rows, that is your 8-second page.
Adding an index
You create an index on the column you search by:
CREATE INDEX idx_orders_user_id ON orders (user_id);
Now the database keeps a sorted list of user_id values, each pointing to where
that row lives. The mental model is a B-tree: a sorted structure that lets
the database binary-search (split in half, again and again) instead of reading
everything.
Looking for user_id = 42 in a sorted index:
[ 50 ]
/ \
[ 25 ] [ 75 ]
|
42 is here-ish, jump straight to it
Each step throws away half the remaining rows. That is how it finds one row out
of millions in a blink.
When indexes help
Index the columns you actually look things up by:
- Columns in a
WHEREfilter:WHERE user_id = 42 - Columns in a
JOIN ... ON:ON orders.user_id = users.id - Columns in
ORDER BY:ORDER BY created_at
If you sort or filter or join on it a lot, it is a good index candidate.
The cost (indexes are not free)
An index is like keeping that sorted list updated. Every time you INSERT,
UPDATE, or DELETE, the database also has to fix the index. So:
- Writes get a little slower.
- Indexes take up disk space.
That is why you do not index every column. Index the ones you search on, and
leave the rest alone.
See it with EXPLAIN
You do not have to guess whether your query uses an index. Ask the database with
EXPLAIN (or EXPLAIN ANALYZE to actually run it and time it):
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
You are looking for one word:
Seq Scan on orders --> bad, it read the whole table
Index Scan on orders --> good, it used the index
"Seq Scan" (sequential scan) means cover-to-cover reading. "Index Scan" means it
jumped to the right page. If you added an index and still see a Seq Scan, your
query is hiding an index killer (more on that below).
The N+1 problem (read this twice)
This one bites almost every web dev who uses an ORM. Say you load 50 users, then
show each user's orders. The lazy ORM does this:
1 query: SELECT * FROM users LIMIT 50; -- get the users
then for EACH user:
SELECT * FROM orders WHERE user_id = 1; -- query #2
SELECT * FROM orders WHERE user_id = 2; -- query #3
...
SELECT * FROM orders WHERE user_id = 50; -- query #51
That is 1 + 50 = 51 queries for one page. Each one is a round trip to the
database. This is the N+1 problem: 1 query to get the list, then N more, one per
item.
The fix: ask once
Grab all the orders in a single query instead of 50:
-- One batched query for everyone's orders
SELECT * FROM orders WHERE user_id IN (1, 2, 3, ..., 50);
Or join them together:
SELECT users.name, orders.total
FROM users
LEFT JOIN orders ON orders.user_id = users.id;
In ORM land this is called eager loading. You tell the ORM "load the orders
with the users" up front:
// Sequelize
User.findAll({ include: Order })
// Prisma
prisma.user.findMany({ include: { orders: true } })
// Rails / ActiveRecord
User.includes(:orders)
// Laravel / Eloquent
User::with('orders')->get()
51 queries become 2. Your slow page is fast again.
Gotchas juniors hit
- Indexing a tiny table. A 50-row table is faster to just scan. An index there adds work for no win. Indexes shine on big tables.
-
Indexing a low-variety column. A column like
is_activethat is onlytrueorfalsebarely helps. The index can only split the table in two, so the database often just scans anyway. -
Leading wildcard
LIKE.WHERE name LIKE '%ohn'cannot use a normal index, because the index is sorted by the start of the word. It is like asking the dictionary for every word that ends in "ohn". Sorting by first letter does not help.WHERE name LIKE 'Joh%'is fine though.
Recap
- No index means a full table scan: the database reads every row.
- An index is a sorted shortcut (think B-tree) so it can jump straight in.
- Index columns used in
WHERE,JOIN ON, andORDER BY. - Indexes cost write speed and disk, so do not index everything.
- Use
EXPLAINto see "Index Scan" (good) vs "Seq Scan" (bad). - The N+1 problem is 1 + N queries. Fix it with a JOIN, a batched
WHERE ... IN (...), or your ORM's eager loading.
Your turn
You have a query: SELECT * FROM users WHERE email = 'a@b.com'. Which column
should you index, and why? And if a page loads 30 products then runs one extra
query per product to get its category, what is that bug called and how do you fix
it? Explain both to a friend and you have got it.
Next up is Part 12: Transactions, where we make several statements happen
all-or-nothing, so you never end up with a paid order that has no items.
Top comments (1)
Part 11 Practice: Indexes and Performance
Try these before peeking at the solutions. They mix "think about it" questions
with real SQL using our shared schema (
users,orders,products,order_items).1. Pick the column to index
Your app runs this query all the time and it has gotten slow:
Which column should you index, and write the statement.
Solution
Why: the column in the
WHEREfilter is what the database searches by. Indexinguser_idturns a full table scan into a fast sorted lookup.2. Read the EXPLAIN
You run
EXPLAINon a query and see this line:Did your query use an index? What does this tell you?
Solution
No. "Seq Scan" means a sequential scan, the database read the whole table
cover to cover. If you expected an index to be used, you are probably missing one
on the filtered column (or something is blocking it, like a leading-wildcard
LIKE). You want to see "Index Scan" instead.3. Spot the N+1
A teammate's code does this on the dashboard:
How many queries does this run for 50 users? What is the bug called?
Solution
It runs 51 queries: 1 to load the users, then 1 per user (50 more). This is
the N+1 query problem. Each loop iteration is a separate round trip to the
database, which gets slow fast.
4. Fix the N+1 with SQL
Rewrite the work above so it grabs every user's orders in one query instead
of 50.
Solution
Why: one JOIN (or one
WHERE user_id IN (...)batch) pulls all the orders atonce. 51 queries become 1 or 2. In an ORM this is eager loading, like
include: { orders: true }.5. Should you index this?
You are tempted to add an index on
orders.status, which is almost always one ofjust two values:
'pending'or'paid'. Good idea or not?Solution
Usually not worth it.
statusis a low-variety column, so an index can onlysplit the table into a couple of big buckets and the database often just scans
anyway. Save indexes for columns with many different values that you filter,
join, or sort on.
6. Why is this search slow even with an index?
You added an index on
products.name, but this query still does a full scan:Why?
Solution
A normal index is sorted by the start of the value. A leading wildcard
(
'%phone%') asks for matches anywhere in the middle, so the sorted order cannothelp, just like asking a dictionary for every word containing "phone". A search
like
LIKE 'phone%'(no leading%) can use the index. For real "contains"search you would reach for a full-text index instead.
Nice work. If you can say which column to index and why, and you can smell an
N+1 from across the room, you are thinking like a backend dev now. See you in
Part 12.