Why Indexes Exist (The Real Problem)
Relational databases are optimized to store large volumes of data reliably, but they are not necessarily optimized to search them efficiently by default. When you ask a database to find a specific row without an index, it often performs a Full Table Scan, looking at every row one by one. This works at a small scale but becomes painfully slow as data grows.
Indexes exist to avoid this brute-force approach. Instead of checking every row, an index allows the database to quickly narrow down where the matching data resides.
B-Tree indexes are the default choice in most SQL databases (PostgreSQL, MySQL/InnoDB, SQL Server, Oracle) because they keep data sorted while supporting inserts and updates in a controlled, predictable way.
The intuition: even with 10 million rows, a B-Tree index typically needs only a handful of comparison steps to find a single value. Locating a row is a matter of a few targeted jumps rather than a scan through millions of entries. Writes do carry overhead—every insert must maintain this structure—but that cost grows slowly, allowing B-Trees to scale exceptionally well.
What Problem a B-Tree Solves
A naive sorted array allows for binary search, but inserts are expensive because data must be shifted to make room (O(n)). A linked list allows cheap inserts, but search is slow (O(n)).
A B-Tree is a data structure designed specifically for storage systems that:
- Keeps data sorted for range queries and ordering
- Allows logarithmic search (O(log n)) complexity
- Minimizes disk I/O by being shallow and wide
- Supports efficient writes through localized node splits
The key constraint: databases read data in pages, not individual rows.
Pages: How Databases Actually Store Data
Before talking about B-Trees, it helps to understand one fundamental idea: databases work with pages.
A page is a fixed-size block of data (commonly 8 KB). When the database needs anything—a row, an index entry, or a pointer—it reads the entire page containing it.
This applies to:
- Table data (the actual rows)
- Index data (B-Tree nodes)
The performance goal is simple: do as much useful work as possible per page read. B-Tree nodes are designed to fit neatly inside one page, allowing the database to make massive progress with every single disk hit.
This page-based model is why B-Trees look very different from textbook binary trees.
B-Tree Structure (High Level)
A B-Tree consists of three layers:
- Root node: the starting point of every search
- Internal nodes: the signposts used for navigation
- Leaf nodes: the bottom layer where the actual data (or pointers) live
Properties
- Sorted keys: keys inside a node are always kept in order
- High fanout: each node contains many keys (often hundreds), not just two
- Balanced depth: all leaf nodes are at the exact same depth, guaranteeing predictable performance regardless of which key you are seeking
Internal vs. Leaf Nodes
- Internal nodes contain keys and pointers to child pages. They are used purely for navigation.
- Leaf nodes contain keys and pointers to the actual rows.
Implementation details differ:
- In MySQL (InnoDB), leaf nodes store the full row (clustered index).
- In PostgreSQL, leaf nodes store row pointers (heap TIDs).
The link: leaf nodes are usually linked together (left-to-right), allowing the database to perform range scans without going back up to the root.
How a Lookup Works
Consider the query:
SELECT * FROM users WHERE email = 'a@b.com';
- Start at the root page
- Compare the key with the keys in the node to find the correct range
- Follow the child pointer to the next page
- Repeat until a leaf node is reached
- Fetch the row pointer or the row itself
Each step is a page read, not a row-by-row operation.
Why B-Tree Height Stays Small
Because each node holds many keys, the tree grows wide, not tall.
Typical numbers:
- Page size: 8 KB
- Key + pointer: ~40 bytes
- Keys per node (fanout): ~200
A tree with height 3 can support:
200³ ≈ 8,000,000 entries
This is why B-Tree lookups remain extremely fast even at massive scale.
Inserts and Node Splits
When you insert a new row, the database must update the B-Tree:
- The key is placed in the correct leaf node
- If the node has space, the insert is trivial
- If the node is full, it splits into two nodes
- The middle key is promoted to the parent node
Splits can propagate upward to the root, but they are rare in practice. This balancing act is the tax paid to keep the tree efficient for future reads.
Range Queries and Ordering
B-Trees keep keys sorted, which enables efficient range scans:
SELECT * FROM orders
WHERE created_at BETWEEN '2025-01-01' AND '2025-01-31';
After locating the first key ('2025-01-01') using the standard lookup, the database simply walks the linked leaf nodes sequentially until it hits the end of the range.
This is why:
- B-Trees natively support
ORDER BY - B-Trees support prefix scans (e.g.
LIKE 'abc%') - Hash indexes cannot do either of these
Composite Indexes
When you create an index on multiple columns, such as (user_id, created_at), the B-Tree sorts by user_id first, and then by created_at for tied values.
-
Efficient for:
user_id = ?oruser_id = ? AND created_at > ? -
Inefficient for:
created_at = ?alone
Think of it like a phone book sorted by (Last Name, First Name). It’s easy to find all Smiths, but impossible to find everyone named John without reading the whole book.
Common Misconceptions
-
"Indexes speed up all queries": False. They speed up reads but add overhead to every
INSERT,UPDATE, andDELETE. - "More indexes are always better": False. Every index increases write amplification and takes up disk space.
- "B-Trees are only for equality": False. Their primary strength is handling ranges and sorted output.
When a B-Tree Is the Wrong Choice
Avoid B-Tree indexes when:
-
Low cardinality: If 90% of rows have
is_active = true, the database will often prefer a full table scan because reading the index plus the table is more expensive. - High-frequency equality only: If you only ever do exact matches on unique keys, a hash index may be faster (though less flexible).
- Append-only / time-series data: Consider BRIN (Block Range Indexes) for massive tables where data is naturally ordered by time.
- Full-text search: Use GIN or specialized inverted indexes for searching words inside long text.
Practical Takeaways
- B-Trees are page-optimized, prioritizing fewer disk hits over CPU cycles
- Tree height matters more than total row count
- Composite index order is critical for query performance
- Understand your read vs. write ratio before adding new indexes
Top comments (1)
In your experience, where have B-Tree indexes hurt performance more than helped?