DEV Community

Yannick Loth
Yannick Loth

Posted on

Understanding database indexes - A simple analogy

TL;DR

Think of how a DB index works as how a dictionary works: how it allows you to quickly find any word's definition.

Introduction

When diving into the world of databases, one of the key concepts you'll encounter is the database index.
To make this idea more relatable, let's compare it to something children learn to use at about ages from 7 to 9 and which we all know well—a word definition dictionary.

By the end of this post, you'll see how this familiar tool can help you understand all types of database indexes.

The Dictionary Analogy

Think about a dictionary for a moment. When you need to find the definition of a word, you don't start reading from the first page, hoping to stumble upon it. Instead, you take advantage of the dictionary's organisation. Here’s how it works:

  1. Alphabetical Order: Words in a dictionary are sorted alphabetically. This sorting is crucial because it allows you to quickly jump to the section where the word should be.
  2. Index at the Top of the Page: Each page in the dictionary typically shows the first and last word on that page, giving you a quick reference to know if the word you’re looking for might be on that page.
  3. Quick Lookup: With this structure, you can find a word and its definition in seconds rather than minutes.

What If Words Were Not Sorted?

Linear search

Imagine if the words in the dictionary were not sorted alphabetically. You’d have to start at the beginning and check each word one by one until you find the one you’re looking for. This method is called a linear search.

  1. Start at the first word.
  2. Check if it’s the word you need.
  3. If not, move to the next word.
  4. Repeat until you find the word.

In the worst-case scenario, if the word you're searching for is the last word in the dictionary, you would have to check all 5,000 words. Therefore, the maximum number of operations in the case of the linear search is 5,000.

Other search strategies

With other search strategies, like choosing a random word and repeating until you have picked the right one, the number of operations may be asymptotically infinite.

If you decided to choose a random word from the list of words that you have not picked yet, the maximum number of operations would be 5000, but you would also have to maintain a list of the words you already picked. This would neither be efficient in time nor in space (memory).

Operations in a Sorted Dictionary

Let’s consider a dictionary with 5,000 words sorted alphabetically.

Because the words are sorted alphabetically, you can employ a binary search to find a word. Binary search is an efficient algorithm that repeatedly divides the search interval in half:

  1. Start by looking at the middle word of the dictionary.
  2. If the word you're looking for comes before the middle word, narrow your search to the first half of the dictionary.
  3. If the word comes after the middle word, narrow your search to the second half.
  4. Repeat this process until you find the word.

In a binary search, each step cuts the number of possible locations in half. Therefore, the maximum number of steps (or operations) needed is the logarithm of the number of words to the base 2, denoted as log2(5000)log⁡_{2}(5000) .

log2(5000)12.29 log⁡_{2}(5000) ≈ 12.29

This means you can find any word in approximately 13 operations at most.

Notes:

  • As we know the first letter of the word we search, we usually already approximate the position of the word when opening the dictionary: obviously, no need to start looking in the middle of the dictionary for a word starting with letter "c". This usually allows us to remove a few search steps, reducing further the maximum number of steps to find a word to less than 12 in the case of a dictionary with 5000 words.
  • As the number of words in the dictionary increases, the gains increase as well because of the logarithmic scale, which increases at a slower rate than the size of the dictionary (compare nn to log2(n)log⁡{2}(n) ). The gains provided by an index become greater as the number of entries grows. This is a key insight to understanding why queries on large tables (>> 100k rows) are still extremely fast despite the huge amount of data: log2(105)16.61log⁡{2}(10^5) ≈ 16.61 , log2(109)29.9log⁡_{2}(10^9) ≈ 29.9 .

Comparison

  • Sorted Dictionary (Binary Search): Approximately 13 operations.
  • Unsorted Dictionary (Linear Search): Up to 5,000 operations.

Now, let’s translate this process to the realm of databases.

Translating to Databases

What is a Database Index?

A database index works similarly to the sorted dictionary, making it faster to find specific data in a database. Here’s how the analogy fits:

  • Sorted Order: Just like the words in a dictionary, data in an index is sorted in a specific order. This might be alphabetical, numerical, or based on another attribute.
  • Index Entries: In a database, an index contains entries that are analogous to dictionary entries. Each entry in the index contains a key (like the word in a dictionary) and a reference to the actual data (like the definition in a dictionary).
  • Efficient Lookup: When you search for data, the database uses the index to quickly locate the information, much like you use the dictionary’s alphabetical order to find a word.

Using an index in a database is akin to using a binary search in a sorted dictionary—it's efficient and significantly reduces the number of operations required to find the desired data. Without an index, the database would need to perform a linear search, scanning each row until it finds the matching data, which is far less efficient.

Note:

  • This shows that a dictionary actually is an index. ### Types of Database Indexes

Let’s generalise this concept to different types of database indexes:

  1. B-Tree Indexes:
    • How It Works: Similar to the alphabetical sorting in a dictionary, B-Tree indexes keep data sorted in a balanced tree structure. This ensures that the database can quickly navigate through the data to find what you need.
    • Use Case: Ideal for range queries and when you need to sort data.
  2. Hash Indexes:
    • How It Works: Instead of sorting data, hash indexes use a hash function to distribute data evenly across a set of buckets. Think of it like using a special code to find the word directly rather than flipping through pages.
    • Use Case: Perfect for exact match queries, where you need to find a specific entry quickly.
  3. Bitmap Indexes:
    • How It Works: Bitmap indexes use bitmaps (arrays of bits) to represent the presence of a value or combination of values. It’s like having multiple dictionaries where each dictionary represents a specific attribute of the words.
    • Use Case: Useful in data warehousing for complex queries involving multiple attributes.
  4. Full-Text Indexes:
    • How It Works: These indexes are designed to handle full-text search capabilities. They index words and phrases, much like a detailed glossary or concordance.
    • Use Case: Great for applications requiring search within large text fields, like articles or books.

Why Are Indexes Important?

Just as you wouldn’t want to read a dictionary cover to cover to find a word, you don’t want your database to scan through every record to find a piece of data. Indexes improve the efficiency and performance of your database queries, ensuring that information can be retrieved quickly and efficiently.

Without indexes, there is no magic: the database is forced systematically read each entry one by one until it finds a hit, potentially the last entry. This not only takes more time, it also consumes more CPU time and requires more I/O operations.

Conclusion

A database index, much like a dictionary, is an essential tool for efficient information retrieval. Whether you're using a B-Tree index, a hash index, a bitmap index, or a full-text index, the underlying principle remains the same: organising data in a way that makes it easy to find. By understanding this concept through the familiar analogy of a dictionary, you can better appreciate how indexes optimise database performance and make data retrieval a breeze.

Top comments (0)