<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Cody Tseng</title>
    <description>The latest articles on DEV Community by Cody Tseng (@codytseng).</description>
    <link>https://dev.to/codytseng</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1323521%2F72ff26c1-2c23-4540-b801-1e699bdd2be3.jpeg</url>
      <title>DEV Community: Cody Tseng</title>
      <link>https://dev.to/codytseng</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codytseng"/>
    <language>en</language>
    <item>
      <title>Understanding How Databases Utilize Indexes</title>
      <dc:creator>Cody Tseng</dc:creator>
      <pubDate>Mon, 04 Mar 2024 04:46:42 +0000</pubDate>
      <link>https://dev.to/codytseng/understanding-how-databases-utilize-indexes-2ie9</link>
      <guid>https://dev.to/codytseng/understanding-how-databases-utilize-indexes-2ie9</guid>
      <description>&lt;p&gt;Stop guessing and start understanding how databases utilize indexes to design them more efficiently!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Translated by ChatGPT 3.5.&lt;br&gt;
This article primarily discusses the most common B+ tree indexes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Indexes act like a table of contents in a book, helping to locate desired content efficiently. Well-designed indexes can enhance search efficiency, while poorly designed ones can degrade it. Therefore, index design is crucial. To create efficient indexes, it’s essential to understand how databases utilize indexes to search for data.&lt;/p&gt;

&lt;h2&gt;
  
  
  B+ Tree
&lt;/h2&gt;

&lt;p&gt;When I was learning about database indexes, every book and article jumped straight into B+ trees, making me want to give up before even starting. However, most of the time, we don’t need to delve deeply into these structures.&lt;/p&gt;

&lt;p&gt;Here’s a simple B+ tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         +----+----+
         | 10 | 20 |
         +----+----+
       /      |      \
  +---+  +----+----+  +----+
  | 5 |  | 15 | 18 |  | 25 |
  +---+  +----+----+  +----+
  / | \    /  |  \    / | \
 1  6  9  11  16 19  21 26 30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It may seem complex, but except for leaf nodes, other nodes exist to quickly locate leaf nodes. By focusing only on leaf nodes and ignoring other nodes, it becomes a simple ordered doubly-linked list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+---+---+---+----+----+----+----+----+----+
| 1 | 6 | 9 | 11 | 16 | 19 | 21 | 26 | 30 |
+---+---+---+----+----+----+----+----+----+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just remember these three characteristics of B+ trees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leaf nodes are ordered.&lt;/li&gt;
&lt;li&gt;Quick navigation to leaf nodes is possible.&lt;/li&gt;
&lt;li&gt;Leaf nodes are connected, allowing fast traversal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ll represent indexes using linked lists from now on. If you’re interested in how B+ trees maintain order, insertion, deletion, etc., you can explore it on your own.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Databases Utilize Indexes
&lt;/h2&gt;

&lt;p&gt;Let’s assume the database has selected what it considers an appropriate index. Here’s a simplified process of how it uses indexes to search for data:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Utilizing the characteristics of the index data structure, the database swiftly locates the first index node that meets the conditions. This node’s corresponding row may not necessarily meet the query conditions because the index’s columns may not include all the columns required by the query.&lt;/li&gt;
&lt;li&gt;Starting from the node located in step 1, the database performs a unidirectional traversal, using the values of the index nodes for preliminary screening. Since B+ tree leaf nodes are ordered and connected, unidirectional traversal is swift. It retrieves the corresponding row data from the disk through nodes that pass the preliminary screening.&lt;/li&gt;
&lt;li&gt;For the rows filtered in step 2, further filtering may occur. This step isn’t always necessary. If the index’s columns include all the columns required by the query conditions, no further filtering is needed. This is the ideal scenario.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, let’s explore some common query scenarios and how databases utilize indexes to search for data, hoping to provide you with some insights.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rapid Location
&lt;/h3&gt;

&lt;p&gt;Querying on the primary key &lt;code&gt;id&lt;/code&gt; is the simplest case. When we query &lt;code&gt;SELECT * FROM table WHERE id = 2&lt;/code&gt;, the database can swiftly find the rows that meet the condition through the index. It's highly efficient with no extra steps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (id)

                 |
                 v
+------++-----+-----+-----+-----+-----+
|  id  ||  1  |  2  |  3  |  4  |  5  |
+------++-----+-----+-----+-----+-----+
                 ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Unidirectional Traversal
&lt;/h3&gt;

&lt;p&gt;When we want to find the three youngest people aged 20 or above &lt;code&gt;SELECT * FROM table WHERE age &amp;gt;= 20 LIMIT 3&lt;/code&gt;, the database quickly locates the first node meeting the condition via the index, then proceeds with unidirectional traversal to find the first three rows meeting the condition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (age)

                   |
                   v ------------&amp;gt;
+-------++------+------+------+------+------+
|  age  ||  18  |  20  |  21  |  22  |  23  |
+-------++------+------+------+------+------+
                   ✅     ✅      ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-column Index
&lt;/h3&gt;

&lt;p&gt;Multi-column indexes are similar to single-column indexes, except leaf nodes of multi-column indexes contain values of multiple columns, also in order. The sorting rule of nodes in multi-column indexes is similar to string sorting, comparing from left to right, considering the comparison result of the first differing column. For instance, with a multi-column index &lt;code&gt;(a, b, c)&lt;/code&gt;, the node &lt;code&gt;(1, 2, 3)&lt;/code&gt; is less than &lt;code&gt;(1, 2, 4)&lt;/code&gt; but greater than &lt;code&gt;(1, 1, 4)&lt;/code&gt;. When querying &lt;code&gt;SELECT * FROM table WHERE a = 1 AND b = 2 AND c = 3&lt;/code&gt;, similar to single-column indexes, the database can quickly find the nodes meeting the conditions via the multi-column index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (a, b, c)

                            |
                            v
+-----++-----+-----+-----+-----+-----+
|  a  ||  1  |  1  |  1  |  1  |  2  |
+-----++-----+-----+-----+-----+-----+
|  b  ||  1  |  1  |  2  |  2  |  5  |
+-----++-----+-----+-----+-----+-----+
|  c  ||  1  |  3  |  2  |  3  |  4  |
+-----++-----+-----+-----+-----+-----+
                            ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Queries fully utilizing &lt;code&gt;(a, b, c)&lt;/code&gt; indexes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SELECT * FROM table WHERE a = 1 AND b = 2 AND c = 3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SELECT * FROM table WHERE a = 1 AND b = 2&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SELECT * FROM table WHERE a = 1&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having a &lt;code&gt;(a, b, c)&lt;/code&gt; index is equivalent to having both &lt;code&gt;(a, b)&lt;/code&gt; and &lt;code&gt;(a)&lt;/code&gt; indexes simultaneously.&lt;/p&gt;

&lt;p&gt;The rule for databases to swiftly locate nodes meeting conditions using multi-column indexes is: match sequentially from left to right based on index columns; stop matching if a column doesn’t match or isn’t in the query conditions.&lt;/p&gt;

&lt;p&gt;For example, querying &lt;code&gt;SELECT * FROM table WHERE a = 1 AND c = 3&lt;/code&gt; with &lt;code&gt;(a, b, c)&lt;/code&gt; index, since it matches &lt;code&gt;a = 1&lt;/code&gt; from left to right and stops due to b not in the query conditions, it's equivalent to using &lt;code&gt;(a)&lt;/code&gt; index in locating the first node meeting the conditions. The database must then traverse all rows with &lt;code&gt;a = 1&lt;/code&gt; for filtering.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (a, b, c)

          |
          v ----------------&amp;gt;
+-----++-----+-----+-----+-----+-----+
|  a  ||  1  |  1  |  1  |  1  |  2  |
+-----++-----+-----+-----+-----+-----+
|  b  ||  1  |  1  |  2  |  2  |  5  |
+-----++-----+-----+-----+-----+-----+
|  c  ||  1  |  3  |  2  |  3  |  4  |
+-----++-----+-----+-----+-----+-----+
          ❌    ✅    ❌    ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 However, during unidirectional traversal and filtering, &lt;code&gt;(a, b, c)&lt;/code&gt; index is faster than &lt;code&gt;(a)&lt;/code&gt; index because &lt;code&gt;(a, b, c)&lt;/code&gt; index nodes contain &lt;code&gt;c&lt;/code&gt; values, eliminating the need to read &lt;code&gt;c&lt;/code&gt; values from disk. Reading from memory is much faster than random disk reads.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Designing a new multi-column index &lt;code&gt;(a, c)&lt;/code&gt; for this query can efficiently locate rows meeting conditions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (a, c)

                      |
                      v ----&amp;gt;
+-----++-----+-----+-----+-----+-----+
|  a  ||  1  |  1  |  1  |  1  |  2  |
+-----++-----+-----+-----+-----+-----+
|  c  ||  1  |  2  |  3  |  3  |  4  |
+-----++-----+-----+-----+-----+-----+
                      ✅    ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Order of Multi-column Indexes
&lt;/h3&gt;

&lt;p&gt;The order of columns in multi-column indexes is crucial. For instance, in a grade table with id, class, score columns, having a multi-column index &lt;code&gt;(score, class)&lt;/code&gt;, when querying &lt;code&gt;SELECT * FROM table WHERE class = 2 AND score &amp;gt;= 60&lt;/code&gt;, the index seemingly helps the query. However, its assistance is limited. After swiftly locating the first node meeting the condition, the database needs to traverse all nodes with scores greater than 60 to find all rows meeting the condition. This may include many nodes not meeting the condition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (score, class)

                            |
                            v -------------------&amp;gt;
+---------++------+------+------+------+------+------+
|  score  ||  45  |  52  |  60  |  64  |  75  |  95  |
+---------++------+------+------+------+------+------+
|  class  ||  1   |  2   |  2   |  1   |  1   |  2   |
+---------++------+------+------+------+------+------+
                            ✅     ❌      ❌     ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;However, using &lt;code&gt;(class, score)&lt;/code&gt; index efficiently locates all rows meeting the condition. There are no nodes not meeting the condition during unidirectional traversal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (class, score)

                                          |
                                          v -----&amp;gt;
+---------++------+------+------+------+------+------+
|  class  ||  1   |  1   |  1   |  2   |  2   |  2   |
+---------++------+------+------+------+------+------+
|  score  ||  45  |  64  |  75  |  52  |  60  |  95  |
+---------++------+------+------+------+------+------+
                                          ✅     ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s not only about designing indexes that databases can adopt but also designing indexes that efficiently utilize actual query requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Utilizing Ordered Indexes
&lt;/h3&gt;

&lt;p&gt;Since indexes themselves are ordered, you can design indexes tailored to query requirements, eliminating the need for additional sorting. For example, in the grade table scenario, querying &lt;code&gt;SELECT * FROM table WHERE class = 2 ORDER BY score ASC&lt;/code&gt; uses an index &lt;code&gt;(class, score)&lt;/code&gt;. The unidirectional traversal through index nodes is ordered, eliminating the need for additional result sorting. If the ordered nature of indexes isn't utilized, even if you need to query a single row, the database has to retrieve all rows meeting the condition and then sort them, which is highly inefficient.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (class, score)

                                   |
                                   v ------------&amp;gt;
+---------++------+------+------+------+------+------+
|  class  ||  1   |  1   |  1   |  2   |  2   |  2   |
+---------++------+------+------+------+------+------+
|  score  ||  45  |  64  |  75  |  52  |  60  |  95  |
+---------++------+------+------+------+------+------+
                                   ✅     ✅     ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As leaf nodes of indexes form bidirectional linked lists, &lt;code&gt;SELECT * FROM table WHERE class = 2 ORDER BY score ASC&lt;/code&gt; can be efficiently queried.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;index on (class, score)

                                                 |
                                    &amp;lt;----------- v
+---------++------+------+------+------+------+------+
|  class  ||  1   |  1   |  1   |  2   |  2   |  2   |
+---------++------+------+------+------+------+------+
|  score  ||  45  |  64  |  75  |  52  |  60  |  95  |
+---------++------+------+------+------+------+------+
                                   ✅     ✅     ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;When designing indexes, strive to arrange adjacent leaf nodes for query result indexes to reduce the number of non-matching nodes during unidirectional traversal.&lt;/li&gt;
&lt;li&gt;Include columns from filtering conditions in indexes to minimize disk reads during filtering.&lt;/li&gt;
&lt;li&gt;If sorting is required, utilize the ordered nature of indexes to avoid additional sorting operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Future explorations could include how database query optimizers select indexes, utilizing indexes in &lt;code&gt;join&lt;/code&gt;, &lt;code&gt;group by&lt;/code&gt;, &lt;code&gt;subqueries&lt;/code&gt;, etc., and common pitfalls with indexes.&lt;/p&gt;

</description>
      <category>database</category>
      <category>sql</category>
      <category>programming</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
