Explain indexing in databases Like I'm Five

Did you find this post useful? Show some love!

Hey there, we see you aren't signed in.

Please consider creating an account on dev.to. It literally takes a few seconds and we'd appreciate the support so much. ❤️

Hey, a few months back the relatable question was discussed. Here you go.

  1. Explain what is db indexing
  2. Explain indexing like I'm five

🙌 Thanks for chiming in and being helpful Chinmay.

We have a non-dupe-shaming culture around here and providing the helpful links is the best approach.

Of course others can feel free to add an answer in this thread if they so choose. There's no harm in carrying on the discussion.

It is highly likely you are already aware of some common indexing techniques, like alphabetic or numeric order.

The latter you might have come into contact with when using arrays or vectors in some programming language and the former in libraries or encyclopedias.

In both cases the indexing system allows for quick insertion as well as quick retrieval once you have identified what you want to retrieve or what the thing you're inserting looks like.

With a database you tend to have more data than is convenient to index with just numbers or alphabetic characters, so those are usually combined with structuring and factoring of the data involved.

It could be pages with tables and relations between them or tree structures or something like that, what kind of data one is handling determines what is a good indexing and structuring technique. A good database engine gives you some control over this so you can optimise for different use cases while developing your program, e.g. flexibility or raw computational performance or visualisation.

Usually one uses index layers that contain reductive interpretations of the stored data that speed up searches by providing shortcuts. It could be links between copies of beginnings of words in the data set, like 'ju', to every data point that starts with these characters, for example the entry about the book Jurassic Park (Michael Crichton) as well as the one regarding Jubilees, as in the Book of Jubilees in Ethiopian Orthodoxy.

Programming such an index is fairly straightforward, one would basically check for the first two or three characters at insertion and add a layer of links to the underlying data, and a more tricky and interesting example of something similar is the soundex algorithm, but there are many, many clever ways to do these kinds of things.

For further reading Knuth's classic The Art of Computer Programming as well as more recent stuff like Cormens Introduction to Algorithms are time well spent.

I thinks this covers the ELI5 level pretty well.

If you want to dig deeper into indexing and how it works with a relational database I would absolutely recommend Use the index luke. Even from the 3 minute test I learnt something.

I’m looking for a number in the phone book where the last name starts with G. Instead of turning every page starting with A to G, I open the phonebook in the middle, then I cut that in half, and half that again till I get to the record I am searching for. That’s essentially what indexes do in a DB.

That is a binary search!

Haha ‘half’ might have been a bad word to use, rather the emphasis should be on removing a subset of the data to eliminate potential possibilities. It’s easier to find a needle in a haystack if you know which section of the hay it isn’t in.

I would have used the library analogy like Idan here:

There is a library with many books, ordered by author. Let's say you want to find an Harry Potter. You go to one of the bookcases, look at one of the books, and it happens to be one of the Lord of the Rings books. Tolkien starts with "T", which is after "R" in the ABC, so the Harry Potters must be before them. So you go to some place before, look at some book, and it's a Wheel of Time book. This time "Jordan" comes before "Rowling", so you look at books after that one - but before the Tolkien you found earlier. By following that method you can quickly find your Harry Potter book.

But what if you were living under a rock, and don't know that the author of Harry Potter is J.K. Rowling? You only know the title, but the books are alphabetically ordered by author, not title, so that fast lookup method won't be of any help.

Luckily, you are not the only uncultured reader that visits that library, so the staff decided to help. They prepared a big list of all the books, and ordered it by the title. You can search the title alphabetically - just like you did with the actual books - find the author, and then search the actual books for it.

One day, the library got a new series - A Song of Ice and Fire by George R. R. Martin. The staff wanted to put it in it's place, right between "Martel" and "Marton" - but that would mean they have to move all the books after it - and that's a lot of work! Do they have to do it every time they get a new book?

So, the library decided to change it's system. From now on, each book has a fixed place - bookcase, shelf, and place in the shelf. New books are always added to the last shelf in the last bookcase - and if that bookcase is full the library orders a new bookcase. These locations are written in the list of books. Writing something in the middle of the list is easier than moving all the books, because you can write between the lines/words and once it gets too messy you can write it again - that's a lot of work, but you only need to do it after many in-place edits, so it's not that bad.

So now if you want to find a book you look it up at the list, read it's location, and go directly there. This is even better than the old system, because you only have to do the alphabetical search once. Searching alphabetically is faster than looking at the titles one by one, but it's still more work than just knowing the location.

But... what if you want to find a book by author? You could do it before, but you can't now because they are no longer ordered by author!

Well - who said we can only have one list? Just make another list, this time ordered by author. And you can make a list for everything you want to search by, or even for several thing (like author and then title). Just... don't overuse it, because you have to update each and every one of the lists every time you get a new book, so you don't want hundreds of them...

The books are the data, and the lists are indexes. You usually append to the data, but the indexes are structures that are easier to edit and keep sorted. When the database writes a new data entry, it also updates the indexes, making it easier to search for that entry later. You can define multiple indexes, each sorted by different fields.

Classic DEV Post from Feb 12

I just got a Raspberry Pi 3. What can I do with it?

Just got a Raspberry Pi 3 setup and it's my first one. If you own a Raspberry P...

Follow @michael to see more of their posts in your feed.