Tyson Cung

Posted on Mar 30

Database Indexing Explained: What's Actually Happening Under the Hood

#sql #performance #database #beginners

Your query is slow. You've checked the code, optimized the ORM, maybe even added caching. But the real fix takes one line of SQL — and it has nothing to do with your application code.

The Full Table Scan Problem

Without an index, every query does the same thing: scans every single row in the table. Got a million users and you're looking up one by email? The database reads all million rows, checks each one, and returns the match.

This is called a full table scan, and it's exactly as bad as it sounds. It scales linearly — double the rows, double the time. At small scale, you won't notice. At 10 million rows, your users definitely will.

The Book Analogy (That Actually Holds Up)

Think of a database table like a book with no table of contents and no page numbers. Finding a specific paragraph means reading page by page from the beginning. That's a full table scan.

An index is the table of contents. It tells you exactly where to look. Instead of reading 500 pages, you check the index, flip to page 347, done. Databases work the same way.

B-Trees: The Data Structure That Runs Your Database

Most database indexes use a B-tree (balanced tree). Here's the short version: data gets organized into a sorted, hierarchical structure where each node contains multiple keys and pointers to child nodes.

When you query WHERE email = 'user@example.com', the database starts at the root node. It compares your search value against the keys in that node, picks the right branch, and moves down. Each level eliminates a huge chunk of data. A B-tree with a million entries? About 20 comparisons to find your row. A full scan? Up to a million.

That's not a marginal improvement — it's orders of magnitude faster. PlanetScale's engineering blog breaks this down well: B-trees keep data sorted so the database can binary-search its way to any record without touching the rest of the table.

Clustered vs Non-Clustered: Pick Your Strategy

A clustered index determines the physical order of data on disk. The table is the index. In MySQL with InnoDB, your primary key is automatically a clustered index. There's only one per table, because data can only be physically sorted one way.

A non-clustered index is a separate structure that stores the indexed column values plus pointers back to the actual rows. You can have many of these. Think of it like multiple bookmark tabs in a book — each one gives you a different way to find content quickly.

When you query using a non-clustered index, the database first searches the index B-tree to find the pointer, then follows it to fetch the full row. Two lookups instead of a full scan — still fast.

Composite Indexes: Order Matters

A composite index covers multiple columns. If you index (last_name, first_name), the data sorts by last name first, then by first name within each last name group.

This means WHERE last_name = 'Smith' uses the index. WHERE last_name = 'Smith' AND first_name = 'Jane' uses it even better. But WHERE first_name = 'Jane' alone? The index is useless — it can't skip to first names without knowing the last name first.

This is called the leftmost prefix rule, and ignoring it is one of the most common indexing mistakes I see.

The Write Cost

Indexes aren't free. Every INSERT, UPDATE, or DELETE on an indexed column means the database has to update the index too. More indexes means slower writes.

For read-heavy workloads (most web apps), this tradeoff is almost always worth it. For write-heavy workloads (logging, analytics ingestion), be more selective. I've seen teams add 15 indexes to a table and wonder why their batch imports take hours.

When to Index

Good candidates:

Columns in WHERE clauses you run frequently
Columns used in JOIN conditions
Columns in ORDER BY (saves a sort operation)
High-cardinality columns (many unique values — email, user ID)

Bad candidates:

Tiny tables (full scan is already fast)
Low-cardinality columns (boolean flags, status with 3 values)
Columns you rarely query on

Use EXPLAIN — Seriously

Every major database has an EXPLAIN command. Run it before your slow query and it'll tell you exactly what the database is doing: full scan, index scan, which index it picked, how many rows it estimates.

EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

If you see type: ALL in MySQL, that's a full table scan. Add an index on that column and run EXPLAIN again. Watch the estimated rows drop from millions to single digits. That feeling never gets old.

The Takeaway

Indexing isn't complicated, but it's the single highest-leverage performance tool most developers underuse. One well-placed index can turn a 30-second query into a 5-millisecond one. Learn B-trees, understand your query patterns, and run EXPLAIN before you blame the database.

Your database isn't slow. Your indexes are just missing.

DEV Community