DEV Community

Cover image for Understanding Database Indexes: How They Work and When They Hurt Performance
Aayush Jain
Aayush Jain

Posted on

Understanding Database Indexes: How They Work and When They Hurt Performance

We’ve all been there. You build a feature, test it on your local machine with 50 rows of data, and it feels lightning-fast. You deploy to production. Three months later, the database CPU is pinned at 99%, and your users are staring at loading spinners.

The fix is usually a single command: CREATE INDEX.

But while indexes are the "magic wand" of database performance, they aren't free. If you treat them as a "set and forget" feature, you might actually be making your application slower.

In this guide, we’re going to look at what’s actually happening inside your database when you search for data, and when an index goes from being a savior to a bottleneck.

1. The Nightmare: The Full Table Scan

Imagine walking into a library with 100,000 books and looking for a specific title. If the books are just piled in the middle of the room in no particular order, you have to pick up every single book until you find the right one.

In database terms, this is a Full Table Scan.

If your table has 1 million rows and no index on the column you are searching, the database engine has to read every single row from the disk. This is O(N) complexity. It’s slow, it’s expensive in terms of I/O, and it doesn't scale.

2. The "Dictionary" Analogy

The most common way to explain an index is the "Table of Contents" at the back of a book. It works, but a Dictionary is actually a better comparison.

In a dictionary, words are stored in alphabetical order. Because they are sorted, you don't start at page one to find the word "Node." You jump to the middle, see that "N" comes after the current page, and narrow your search.

This is what an index does: It creates a separate, sorted data structure that points to the actual location of the data.

3. Under the Hood: The B-Tree

If you want to move from Junior to Senior, you need to know the name B-Tree (Balanced Tree). Most modern databases (PostgreSQL, MySQL, SQL Server) use B-Trees for their default indexing.

A B-Tree solves the write-speed problem of simple sorted lists by organizing data into a hierarchy of "nodes."

  • The Root: The entry point.

  • Internal Nodes: These act like signposts ("Go left for A-M, go right for N-Z").

  • Leaf Nodes: The bottom of the tree that contains the actual pointer to the row on your disk.

Because the tree is balanced, the distance from the top to any piece of data is always roughly the same. Searching a B-Tree is O(log N). To put that in perspective: searching 1 million rows only takes about 20 "jumps" through the tree.

Conceptual Search Example

Imagine we are searching for a user with email = 'chris@example.com' in a large users table, and we have an index on the email column.

Instead of reading a million rows, the database conceptually performs a few logical steps:

  1. Start at the Root Node (Jump 1): The root node holds boundaries (e.g., the lowest and highest values in each sub-branch). It determines that 'chris@example.com' falls between the values of the Middle Internal Node.

  2. Move to Internal Node (Jump 2): The database loads the middle node. This node might say:

* Emails < F: Go Left

* Emails F-O: Go Middle

* Emails > O: Go Right Since C is before F, the database chooses the **Left Leaf Node** pointer.
Enter fullscreen mode Exit fullscreen mode
  1. Move to Leaf Node (Jump 3): The Leaf Node is loaded. It is a dense, sorted list containing the email and the primary key ID (the pointer). The database quickly performs a binary search within this small list:

    // Example content of the Leaf Node (sorted list)
    [
      { email: 'adam@example.com', userId: 101 },
      { email: 'ben@example.com', userId: 450 },
      { email: 'chris@example.com', userId: 221 }, // Found it!
      { email: 'diana@example.com', userId: 890 }
    ]
    
  2. Fetch the Data: The database takes the userId: 221 and uses it to find the entire user record from the main table.

The search is complete in four disk operations (log N*)* instead of a million. That is the power of a B-Tree index.

4. Clustered vs. Non-Clustered Indexes

This is where many developers get confused. There are two main ways a database stores these indexes:

Clustered Index (The Physical Order)

Think of this as the dictionary itself. The data is physically stored on the disk in the order of the index. This is almost always your Primary Key.

  • Rule: You can only have one clustered index per table (because you can't physically sort the same data in two different ways).

Non-Clustered Index (The "Phone Book")

This is a separate structure from the actual table. It contains a copy of the indexed column and a "pointer" (like a GPS coordinate) to where the rest of the row lives.

  • Rule: You can have many non-clustered indexes, but each one adds "weight" to your database.

5. The Cost: When Indexes Hurt Performance

If indexes are so magical, why not just index every column? Because every index is a trade-off. It’s faster to read but slower to write, and costs money in storage.

A. The Write Penalty

Every time you perform one of these operations on a column that is indexed, the database has to update the B-Tree structure for that index:

  1. INSERT: The database has to insert the new value and potentially re-balance the B-Tree to maintain its sorted structure.

  2. UPDATE: If you update an indexed column (e.g., changing a user's email), the database has to delete the old entry in the index and insert a new one—a costly operation.

  3. DELETE: The database must locate and remove the entry from the B-Tree.

If you have five indexes on a table, a single INSERT means the database must perform five index write operations in addition to writing the main record. On tables with very high write traffic (like a logging or telemetry table), too many indexes can severely degrade performance.

B. Storage and Memory Overhead

Indexes are separate data structures. They consume disk space and, critically, they consume memory.

Databases attempt to keep frequently accessed index nodes in RAM for speed. If your index size is larger than your available memory, the database constantly has to read those index nodes from the disk, negating some of the speed benefits and causing I/O bottlenecks. The larger your indexes, the less memory is available for caching your actual data.

6. Advanced Concepts: Indexing with Precision

To use indexes effectively, you must go beyond indexing individual columns. You need to understand your query patterns.

A. Composite Indexes (Order Matters)

A composite index uses multiple columns in a specific order. They are crucial for supporting complex WHERE and ORDER BY clauses.

Example: You frequently run the query:

SELECT * FROM orders
WHERE customer_id = 123 AND order_date > '2023-01-01'
ORDER BY order_date DESC;
Enter fullscreen mode Exit fullscreen mode

You should create a composite index that matches the pattern: (customer_id, order_date).

Why order matters:

  • The database uses the first column (customer_id) to find a narrow slice of the data.

  • The database then uses the second column (order_date) to immediately satisfy the next condition and the ORDER BY clause without needing to sort the results later.

If you reverse the order to (order_date, customer_id), the index becomes useless for queries that only filter by customer_id. The database cannot skip the initial order_date search. The rule is: put the most selective column first.

B. The Cardinality Trap

Cardinality refers to the number of unique values in a column relative to the total number of rows.

  • High Cardinality: A column like email or SSN (every value is unique). Indexes are extremely effective here.

  • Low Cardinality: A column like is_active (only two values: true/false) or country (a few dozen values).

The Trap: Indexing a low-cardinality column is often pointless. If you search for is_active = true and that covers 90% of your table, the database optimizer will often decide it is faster to just do a full table scan than to jump through the index B-Tree, because it has to fetch almost all the rows anyway.

Summary: Rules of Thumb for Indexing

When to Index When NOT to Index
Columns used in the WHERE clause. Columns on small tables (e.g., < 10,000 rows).
Columns used in JOIN conditions. Columns with very low cardinality (is_active, status).
Columns used in ORDER BY or GROUP BY. On tables with extremely high write frequency.
The most restrictive column in a composite index (put it first). Columns that are frequently updated.

The key takeaway is that indexing is a crucial exercise in balancing read speed against write cost. Always look at your database query execution plans (using EXPLAIN ANALYZE in Postgres or MySQL) to confirm that the index is actually being used and benefiting your performance.

Top comments (0)