DEV Community

Cover image for Mastering B-Trees and B+Trees in C: A Complete Guide from Scratch
Farhad Rahimi Klie
Farhad Rahimi Klie

Posted on

Mastering B-Trees and B+Trees in C: A Complete Guide from Scratch

When it comes to efficient data storage and retrieval, B-Trees and B+Trees are the backbone of databases, file systems, and indexing engines. In this article, we will explore these tree structures in full detail, including all functionalities, C implementation, and practical usage. No shortcuts, no skipped steps—everything explained from the ground up.


📌 What is a B-Tree?

A B-Tree is a self-balancing search tree in which nodes can have multiple keys and children. It maintains sorted data and allows logarithmic time search, insertion, and deletion.

Key properties of a B-Tree of minimum degree T:

  1. Every node contains at most 2*T - 1 keys.
  2. Every non-leaf node has at most 2*T children.
  3. Every node (except root) has at least T - 1 keys.
  4. All leaves appear at the same level.
  5. Keys in a node are always sorted in ascending order.

B-Trees are widely used in databases like MySQL, PostgreSQL, and filesystems like NTFS and FAT.


📌 What is a B+Tree?

A B+Tree is an enhanced variant of a B-Tree. The main differences are:

  1. Internal nodes store only keys, not actual data (pointers to children only).
  2. Leaf nodes contain all data and are linked as a linked list for sequential access.
  3. Search is faster for range queries because leaves are linked.

Key properties:

  • Maintains sorted order of keys.
  • Efficient for both point lookups and range queries.
  • Widely used in database indexing and key-value storage systems.

🛠️ B-Tree Implementation in C

Here’s a step-by-step C implementation of a B-Tree with all functionalities.

1️⃣ Node Structure

#include <stdio.h>
#include <stdlib.h>

#define T 2  // Minimum degree

typedef struct BTreeNode {
    int keys[2*T - 1];            // Max keys = 2*T - 1
    struct BTreeNode *children[2*T]; // Max children = 2*T
    int n;                        // Number of keys
    int leaf;                     // 1 if leaf, 0 otherwise
} BTreeNode;
Enter fullscreen mode Exit fullscreen mode

2️⃣ Creating a Node

BTreeNode* createNode(int leaf) {
    BTreeNode* node = (BTreeNode*)malloc(sizeof(BTreeNode));
    node->leaf = leaf;
    node->n = 0;
    for(int i = 0; i < 2*T; i++)
        node->children[i] = NULL;
    return node;
}
Enter fullscreen mode Exit fullscreen mode

3️⃣ Traversal (In-Order)

void traverse(BTreeNode* root) {
    if (root != NULL) {
        for (int i = 0; i < root->n; i++) {
            if (!root->leaf)
                traverse(root->children[i]);
            printf("%d ", root->keys[i]);
        }
        if (!root->leaf)
            traverse(root->children[root->n]);
    }
}
Enter fullscreen mode Exit fullscreen mode

4️⃣ Searching for a Key

BTreeNode* search(BTreeNode* root, int key) {
    int i = 0;
    while (i < root->n && key > root->keys[i])
        i++;

    if (i < root->n && key == root->keys[i])
        return root;

    if (root->leaf)
        return NULL;

    return search(root->children[i], key);
}
Enter fullscreen mode Exit fullscreen mode

5️⃣ Insertion

Step 1: Split a full child node

void splitChild(BTreeNode* parent, int i, BTreeNode* fullChild) {
    BTreeNode* newChild = createNode(fullChild->leaf);
    newChild->n = T - 1;

    // Copy last T-1 keys to newChild
    for (int j = 0; j < T-1; j++)
        newChild->keys[j] = fullChild->keys[j + T];

    // Copy child pointers if not leaf
    if (!fullChild->leaf)
        for (int j = 0; j < T; j++)
            newChild->children[j] = fullChild->children[j + T];

    fullChild->n = T - 1;

    // Shift parent's children
    for (int j = parent->n; j >= i+1; j--)
        parent->children[j+1] = parent->children[j];

    parent->children[i+1] = newChild;

    // Shift parent's keys
    for (int j = parent->n-1; j >= i; j--)
        parent->keys[j+1] = parent->keys[j];

    parent->keys[i] = fullChild->keys[T-1];
    parent->n++;
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Insert non-full

void insertNonFull(BTreeNode* node, int key) {
    int i = node->n - 1;

    if (node->leaf) {
        while (i >= 0 && node->keys[i] > key) {
            node->keys[i+1] = node->keys[i];
            i--;
        }
        node->keys[i+1] = key;
        node->n++;
    } else {
        while (i >= 0 && node->keys[i] > key)
            i--;

        if (node->children[i+1]->n == 2*T - 1) {
            splitChild(node, i+1, node->children[i+1]);
            if (key > node->keys[i+1])
                i++;
        }
        insertNonFull(node->children[i+1], key);
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Insert wrapper

BTreeNode* insert(BTreeNode* root, int key) {
    if (root->n == 2*T - 1) {
        BTreeNode* newRoot = createNode(0);
        newRoot->children[0] = root;
        splitChild(newRoot, 0, root);
        insertNonFull(newRoot, key);
        return newRoot;
    } else {
        insertNonFull(root, key);
        return root;
    }
}
Enter fullscreen mode Exit fullscreen mode

6️⃣ B+Tree Differences

To implement a B+Tree in C:

  1. Internal nodes store only keys, not actual data.
  2. Leaf nodes store actual data and point to the next leaf.
  3. Traversal of leaves allows fast sequential access.

Leaf Node Structure Example:

typedef struct BPlusNode {
    int keys[2*T - 1];
    int data[2*T - 1];      // store actual values
    struct BPlusNode *children[2*T];
    struct BPlusNode *next; // link to next leaf
    int n;
    int leaf;
} BPlusNode;
Enter fullscreen mode Exit fullscreen mode

The insertion, split, and search logic are similar, but all data moves to leaves, and internal nodes serve only as guides.


⚡ Practical Applications

  • Database Indexing: MySQL, PostgreSQL, Oracle.
  • File Systems: NTFS, FAT, and ext4.
  • Key-Value Stores: RocksDB, LevelDB.
  • Range Queries: B+Trees are preferred for fast sequential access.

🔑 Key Takeaways

  1. B-Trees: Balanced multi-way search trees; internal and leaf nodes store data.
  2. B+Trees: Leaf-linked trees; all data stored in leaves; ideal for sequential access.
  3. Time Complexity: Search, Insert, Delete = O(log n).
  4. C Implementation: Stepwise insertion, split, search, and traversal explained.
  5. Real-World Impact: Foundation of modern database and filesystem performance.

💻 Next Steps

  • Implement deletion in B-Trees and B+Trees.
  • Add range query support for B+Trees.
  • Optimize memory usage for large-scale datasets.

With this, you now have a complete understanding and code reference for B-Trees and B+Trees in C. Whether you are building a database, file system, or an indexing engine, this knowledge is foundational.

Top comments (0)