Musings about Database Indexes

#sql #healthydebate

And not "Indicies"; I just took the Triplebyte backend developer test last night, and this was on the assessment.

Never take a Triplebyte assessment late at night. Your concentration falls if you're not blasting loud music in your headphones and the prolonged drowsiness dulls your comprehension as smooth as an unsharpened №2 pencil.

Notwithstanding all that, I didn't do so badly. In fact, I'm just shy of the 50th percentile of backend devs. But my DB indexes skills need some reflexing.

A caveat: I admit I'm not the best at handling databases, and while I got my relational DB theory and basic types, rows/column/keys, and SQL nomenclature out of the way, indices are a bit advanced for me, so we'll try to journey through it in this post, and hopefully by then we all will get the gist of them.

Also, before we begin, I would like to go through the subject of why they are called database Indexes and not Indices. The topic is apparently so confusing that even Nasdaq wrote an article about it. Technically either is correct since they are both plural forms of the word "index," but I see pages universally saying indexes so we will stick with that throughout this article.

TL;DR: the word "indices" paired with any word ranks higher on search engines. It is generally more ubiquitous on the web but we Americans have gotten used to saying the word "indexes" so often that we wrote them all over technical documentation. Sigh.

So… what exactly is a DB index?

Now that the naming hoo-ha is out of the way, let's turn our attention to the Wikipedia article about the subject:

"A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records."

Woah. Let's break this down into detail. We have "improves the speed of data retrieval operations," which implies that indexes speed up queries. But how? And why not just improve table lookup speeds instead?

How does DB indexing work?

From what I gathered, an index stores an entire column into an internal structure with faster read times than the database table itself. You can store any column you want inside an index using the following SQL statement:

CREATE INDEX <index_name>
ON <table_name> (column1, column2, ...)

This is not universal syntax - the exact command might vary depending on the database management system used.

It also allows for multiple columns to be indexed simultaneously, potentially speeding up read times for values in those columns.

It's like a mini-table with just those columns but sorted on the first column column1. This sorting feature is the crucial part that makes the whole indexing system work. A DB already knows where a particular primary key is, so having the ability to find the rest of the filtered columns by another "key" allows the entire (filtered) row to be fetched rapidly.

You may not know this, but primary keys are already stored inside indexes without any additional coding on your side.

However, it is worth noting that while indexes are read faster than tables, they are slower at writing than tables. Hence, do not create indexes on tables frequently updated with new rows or new column values.

So now that we know when not to use indexes let's look at an analogy that demonstrates its practical use.

A Book Index

A brilliant example of an index is given by Stack Overflow user Sankarganesh Eswaran:

Classic example "Index in Books"
Consider a "Book" of 1000 pages, divided by 10 Chapters, each section with 100 pages.

Simple, huh?

Now, imagine you want to find a particular Chapter that contains a word "Alchemist". Without an index page, you have no other option than scanning through the entire book/Chapters. i.e: 1000 pages.

This analogy is known as "Full Table Scan" in database world.

But with an index page, you know where to go! And more, to lookup any particular Chapter that matters, you just need to look over the index page, again and again, every time. After finding the matching index you can efficiently jump to that chapter by skipping the rest.

But then, in addition to actual 1000 pages, you will need another ~10 pages to show the indices, so totally 1010 pages.

In the case of a database, the book index could be a table with the entry and page number as columns with some other metadata columns (for example, whether it is a child entry of another entry) and an ID primary key. But the entry and page number can be stored by themselves in an index. And since there will only be one unique entry in a book index, that can be used as the sorting column of the index as well.

Source: https://stackoverflow.com/questions/1108/how-does-database-indexing-work

Most of the time, when people read about a topic, some of it tends to stick in their heads, so at that next Triplebyte test in four months time (which you hopefully are not taking at night!), you'll at least have a cursory idea of what database indexes are.

Cover image by @Zozulinskyi via Twenty20

DEV Community

Musings about Database Indexes

So… what exactly is a DB index?

How does DB indexing work?

A Book Index

Top comments (0)