<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ashraf Shaik</title>
    <description>The latest articles on DEV Community by Ashraf Shaik (@ashraf_shaik_ee7ab42b0eb3).</description>
    <link>https://dev.to/ashraf_shaik_ee7ab42b0eb3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3743268%2F51a68642-5a10-4628-9070-b365db071d7a.png</url>
      <title>DEV Community: Ashraf Shaik</title>
      <link>https://dev.to/ashraf_shaik_ee7ab42b0eb3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ashraf_shaik_ee7ab42b0eb3"/>
    <language>en</language>
    <item>
      <title>How B-Tree Indexes Work in SQL Databases: An Intuitive Guide</title>
      <dc:creator>Ashraf Shaik</dc:creator>
      <pubDate>Tue, 03 Feb 2026 18:51:59 +0000</pubDate>
      <link>https://dev.to/ashraf_shaik_ee7ab42b0eb3/how-b-tree-indexes-work-in-sql-databases-an-intuitive-guide-3e71</link>
      <guid>https://dev.to/ashraf_shaik_ee7ab42b0eb3/how-b-tree-indexes-work-in-sql-databases-an-intuitive-guide-3e71</guid>
      <description>&lt;h2&gt;
  
  
  Why Indexes Exist (The Real Problem)
&lt;/h2&gt;

&lt;p&gt;Relational databases are optimized to store large volumes of data reliably, but they are not necessarily optimized to search them efficiently by default. When you ask a database to find a specific row without an index, it often performs a &lt;strong&gt;Full Table Scan&lt;/strong&gt;, looking at every row one by one. This works at a small scale but becomes painfully slow as data grows.&lt;/p&gt;

&lt;p&gt;Indexes exist to avoid this brute-force approach. Instead of checking every row, an index allows the database to quickly narrow down where the matching data resides.&lt;/p&gt;

&lt;p&gt;B-Tree indexes are the default choice in most SQL databases (PostgreSQL, MySQL/InnoDB, SQL Server, Oracle) because they keep data sorted while supporting inserts and updates in a controlled, predictable way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The intuition:&lt;/strong&gt; even with 10 million rows, a B-Tree index typically needs only a handful of comparison steps to find a single value. Locating a row is a matter of a few targeted &lt;em&gt;jumps&lt;/em&gt; rather than a scan through millions of entries. Writes do carry overhead—every insert must maintain this structure—but that cost grows slowly, allowing B-Trees to scale exceptionally well.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Problem a B-Tree Solves
&lt;/h2&gt;

&lt;p&gt;A naive sorted array allows for binary search, but inserts are expensive because data must be shifted to make room (&lt;strong&gt;O(n)&lt;/strong&gt;). A linked list allows cheap inserts, but search is slow (&lt;strong&gt;O(n)&lt;/strong&gt;).&lt;/p&gt;

&lt;p&gt;A B-Tree is a data structure designed specifically for storage systems that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keeps data sorted for range queries and ordering&lt;/li&gt;
&lt;li&gt;Allows logarithmic search (&lt;strong&gt;O(log n)&lt;/strong&gt;) complexity&lt;/li&gt;
&lt;li&gt;Minimizes disk I/O by being &lt;em&gt;shallow&lt;/em&gt; and &lt;em&gt;wide&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Supports efficient writes through localized node splits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key constraint: &lt;strong&gt;databases read data in pages, not individual rows.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Pages: How Databases Actually Store Data
&lt;/h2&gt;

&lt;p&gt;Before talking about B-Trees, it helps to understand one fundamental idea: databases work with &lt;strong&gt;pages&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A page is a fixed-size block of data (commonly &lt;strong&gt;8 KB&lt;/strong&gt;). When the database needs anything—a row, an index entry, or a pointer—it reads the entire page containing it.&lt;/p&gt;

&lt;p&gt;This applies to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Table data (the actual rows)&lt;/li&gt;
&lt;li&gt;Index data (B-Tree nodes)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The performance goal is simple: &lt;strong&gt;do as much useful work as possible per page read&lt;/strong&gt;. B-Tree nodes are designed to fit neatly inside one page, allowing the database to make massive progress with every single disk hit.&lt;/p&gt;

&lt;p&gt;This page-based model is why B-Trees look very different from textbook binary trees.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7w54vy4fhalxsp7pcac5.png" alt="page" width="800" height="800"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  B-Tree Structure (High Level)
&lt;/h2&gt;

&lt;p&gt;A B-Tree consists of three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Root node&lt;/strong&gt;: the starting point of every search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal nodes&lt;/strong&gt;: the &lt;em&gt;signposts&lt;/em&gt; used for navigation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leaf nodes&lt;/strong&gt;: the bottom layer where the actual data (or pointers) live&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sorted keys&lt;/strong&gt;: keys inside a node are always kept in order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High fanout&lt;/strong&gt;: each node contains many keys (often hundreds), not just two&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balanced depth&lt;/strong&gt;: all leaf nodes are at the exact same depth, guaranteeing predictable performance regardless of which key you are seeking&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Internal vs. Leaf Nodes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Internal nodes&lt;/strong&gt; contain keys and pointers to child pages. They are used purely for navigation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leaf nodes&lt;/strong&gt; contain keys and pointers to the actual rows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Implementation details differ:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In &lt;strong&gt;MySQL (InnoDB)&lt;/strong&gt;, leaf nodes store the full row (&lt;em&gt;clustered index&lt;/em&gt;).&lt;/li&gt;
&lt;li&gt;In &lt;strong&gt;PostgreSQL&lt;/strong&gt;, leaf nodes store row pointers (heap TIDs).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The link:&lt;/strong&gt; leaf nodes are usually linked together (left-to-right), allowing the database to perform range scans without going back up to the root.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwjvebofpklqsufvj5ol.png" alt="B-tree storing of data" width="800" height="533"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  How a Lookup Works
&lt;/h2&gt;

&lt;p&gt;Consider the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'a@b.com'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Start at the root page&lt;/li&gt;
&lt;li&gt;Compare the key with the keys in the node to find the correct range&lt;/li&gt;
&lt;li&gt;Follow the child pointer to the next page&lt;/li&gt;
&lt;li&gt;Repeat until a leaf node is reached&lt;/li&gt;
&lt;li&gt;Fetch the row pointer or the row itself&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step is a page read, not a row-by-row operation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why B-Tree Height Stays Small
&lt;/h2&gt;

&lt;p&gt;Because each node holds many keys, the tree grows &lt;strong&gt;wide&lt;/strong&gt;, not tall.&lt;/p&gt;

&lt;p&gt;Typical numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page size: &lt;strong&gt;8 KB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Key + pointer: ~&lt;strong&gt;40 bytes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Keys per node (fanout): ~&lt;strong&gt;200&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A tree with height &lt;strong&gt;3&lt;/strong&gt; can support:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;200³ ≈ 8,000,000 entries
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why B-Tree lookups remain extremely fast even at massive scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Inserts and Node Splits
&lt;/h2&gt;

&lt;p&gt;When you insert a new row, the database must update the B-Tree:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The key is placed in the correct leaf node&lt;/li&gt;
&lt;li&gt;If the node has space, the insert is trivial&lt;/li&gt;
&lt;li&gt;If the node is full, it splits into two nodes&lt;/li&gt;
&lt;li&gt;The middle key is promoted to the parent node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Splits can propagate upward to the root, but they are rare in practice. This balancing act is the &lt;em&gt;tax&lt;/em&gt; paid to keep the tree efficient for future reads.&lt;/p&gt;




&lt;h2&gt;
  
  
  Range Queries and Ordering
&lt;/h2&gt;

&lt;p&gt;B-Trees keep keys sorted, which enables efficient range scans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-31'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After locating the first key (&lt;code&gt;'2025-01-01'&lt;/code&gt;) using the standard lookup, the database simply walks the linked leaf nodes sequentially until it hits the end of the range.&lt;/p&gt;

&lt;p&gt;This is why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;B-Trees natively support &lt;code&gt;ORDER BY&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;B-Trees support prefix scans (e.g. &lt;code&gt;LIKE 'abc%'&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Hash indexes cannot do either of these&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Composite Indexes
&lt;/h2&gt;

&lt;p&gt;When you create an index on multiple columns, such as &lt;code&gt;(user_id, created_at)&lt;/code&gt;, the B-Tree sorts by &lt;code&gt;user_id&lt;/code&gt; first, and then by &lt;code&gt;created_at&lt;/code&gt; for tied values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Efficient for:&lt;/strong&gt; &lt;code&gt;user_id = ?&lt;/code&gt; or &lt;code&gt;user_id = ? AND created_at &amp;gt; ?&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inefficient for:&lt;/strong&gt; &lt;code&gt;created_at = ?&lt;/code&gt; alone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like a phone book sorted by &lt;em&gt;(Last Name, First Name)&lt;/em&gt;. It’s easy to find all &lt;em&gt;Smiths&lt;/em&gt;, but impossible to find everyone named &lt;em&gt;John&lt;/em&gt; without reading the whole book.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Misconceptions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Indexes speed up all queries"&lt;/strong&gt;: False. They speed up reads but add overhead to every &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"More indexes are always better"&lt;/strong&gt;: False. Every index increases write amplification and takes up disk space.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"B-Trees are only for equality"&lt;/strong&gt;: False. Their primary strength is handling ranges and sorted output.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When a B-Tree Is the Wrong Choice
&lt;/h2&gt;

&lt;p&gt;Avoid B-Tree indexes when:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Low cardinality&lt;/strong&gt;: If 90% of rows have &lt;code&gt;is_active = true&lt;/code&gt;, the database will often prefer a full table scan because reading the index plus the table is more expensive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High-frequency equality only&lt;/strong&gt;: If you only ever do exact matches on unique keys, a hash index may be faster (though less flexible).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Append-only / time-series data&lt;/strong&gt;: Consider BRIN (Block Range Indexes) for massive tables where data is naturally ordered by time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-text search&lt;/strong&gt;: Use GIN or specialized inverted indexes for searching words inside long text.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Practical Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;B-Trees are page-optimized, prioritizing fewer disk hits over CPU cycles&lt;/li&gt;
&lt;li&gt;Tree height matters more than total row count&lt;/li&gt;
&lt;li&gt;Composite index order is critical for query performance&lt;/li&gt;
&lt;li&gt;Understand your read vs. write ratio before adding new indexes&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
