DEV Community

OPEYEMI OLUWAGBEMIGA
OPEYEMI OLUWAGBEMIGA

Posted on

Slapping Secondary Indexes on Random Fields is Silently Killing Your Database.

Why do we add indexes to our SQL fields? To make the search faster, right?

But do you know it has a massive downside? Writes become slower, forcing developers to be strategic about which fields should be labelled as secondary indexes.

What is the magic behind indexes? Let's say we have a table of books called “book” with fields (id, title, author, pub_date, isbn).

SELECT * FROM book WHERE author=”C.S. Lewis”

This causes the database engine to search through the Heap (a physical unstructured file where all data in the database actually lives). Let's say we have a total of 1 million rows in the table and only 20 Lewis books. The database engine would scan through all the 1 million rows, even when it had gotten the total of 20 Lewis books, just to be sure it didn’t miss any.

This is a Full Table Scan with O(n) complexity. It is brutally slow.

THE FIX

When you add the field “author” as a secondary index

CREATE INDEX idx_author ON book(author);

The database engine creates a separate, highly organized B-Tree structure. It stores the Author names in alphabetical order, paired with a pointer (physical addresses) to where the full row actually lives.

So instead of scanning through all 1 million rows, the database traverses through the B-Tree in O(log n) time to get the 20 records and their pointers. It then uses the pointers to get the data it needs from the heap. That’s how secondary indexes make reads faster.

BUT WHY DO WRITES GET SLOWER?

Adding or updating data no longer means just changing the heap data, but also the additional B-Tree created by the secondary indexes. So if you blindly add 5 secondary indexes to a table, every single INSERT or UPDATE means writing to the Heap plus updating 5 separate B-Trees on the disk.

That is the hidden cost. As a backend developer, your job isn’t to index everything; instead, your job is to understand the tradeoff.

TIPS FOR SECONDARY INDEXES

Do not index fields that are constantly updated, like page visits and view counts. This is to reduce the overhead of the database’s write performance.

Only index fields that you actively use in your WHERE, JOIN, or ORDER BY clauses.

Top comments (0)