Farhad Rahimi Klie

Posted on Apr 1

Mastering B-Trees and B+Trees in C: A Complete Guide from Scratch

#btree #database #datastructures #bplustree

When it comes to efficient data storage and retrieval, B-Trees and B+Trees are the backbone of databases, file systems, and indexing engines. In this article, we will explore these tree structures in full detail, including all functionalities, C implementation, and practical usage. No shortcuts, no skipped steps—everything explained from the ground up.

📌 What is a B-Tree?

A B-Tree is a self-balancing search tree in which nodes can have multiple keys and children. It maintains sorted data and allows logarithmic time search, insertion, and deletion.

Key properties of a B-Tree of minimum degree T:

Every node contains at most 2*T - 1 keys.
Every non-leaf node has at most 2*T children.
Every node (except root) has at least T - 1 keys.
All leaves appear at the same level.
Keys in a node are always sorted in ascending order.

B-Trees are widely used in databases like MySQL, PostgreSQL, and filesystems like NTFS and FAT.

📌 What is a B+Tree?

A B+Tree is an enhanced variant of a B-Tree. The main differences are:

Internal nodes store only keys, not actual data (pointers to children only).
Leaf nodes contain all data and are linked as a linked list for sequential access.
Search is faster for range queries because leaves are linked.

Key properties:

Maintains sorted order of keys.
Efficient for both point lookups and range queries.
Widely used in database indexing and key-value storage systems.

🛠️ B-Tree Implementation in C

Here’s a step-by-step C implementation of a B-Tree with all functionalities.

1️⃣ Node Structure

#include <stdio.h>
#include <stdlib.h>

#define T 2  // Minimum degree

typedef struct BTreeNode {
    int keys[2*T - 1];            // Max keys = 2*T - 1
    struct BTreeNode *children[2*T]; // Max children = 2*T
    int n;                        // Number of keys
    int leaf;                     // 1 if leaf, 0 otherwise
} BTreeNode;

2️⃣ Creating a Node

BTreeNode* createNode(int leaf) {
    BTreeNode* node = (BTreeNode*)malloc(sizeof(BTreeNode));
    node->leaf = leaf;
    node->n = 0;
    for(int i = 0; i < 2*T; i++)
        node->children[i] = NULL;
    return node;
}

3️⃣ Traversal (In-Order)

void traverse(BTreeNode* root) {
    if (root != NULL) {
        for (int i = 0; i < root->n; i++) {
            if (!root->leaf)
                traverse(root->children[i]);
            printf("%d ", root->keys[i]);
        }
        if (!root->leaf)
            traverse(root->children[root->n]);
    }
}

4️⃣ Searching for a Key

BTreeNode* search(BTreeNode* root, int key) {
    int i = 0;
    while (i < root->n && key > root->keys[i])
        i++;

    if (i < root->n && key == root->keys[i])
        return root;

    if (root->leaf)
        return NULL;

    return search(root->children[i], key);
}

5️⃣ Insertion

Step 1: Split a full child node

void splitChild(BTreeNode* parent, int i, BTreeNode* fullChild) {
    BTreeNode* newChild = createNode(fullChild->leaf);
    newChild->n = T - 1;

    // Copy last T-1 keys to newChild
    for (int j = 0; j < T-1; j++)
        newChild->keys[j] = fullChild->keys[j + T];

    // Copy child pointers if not leaf
    if (!fullChild->leaf)
        for (int j = 0; j < T; j++)
            newChild->children[j] = fullChild->children[j + T];

    fullChild->n = T - 1;

    // Shift parent's children
    for (int j = parent->n; j >= i+1; j--)
        parent->children[j+1] = parent->children[j];

    parent->children[i+1] = newChild;

    // Shift parent's keys
    for (int j = parent->n-1; j >= i; j--)
        parent->keys[j+1] = parent->keys[j];

    parent->keys[i] = fullChild->keys[T-1];
    parent->n++;
}

Step 2: Insert non-full

void insertNonFull(BTreeNode* node, int key) {
    int i = node->n - 1;

    if (node->leaf) {
        while (i >= 0 && node->keys[i] > key) {
            node->keys[i+1] = node->keys[i];
            i--;
        }
        node->keys[i+1] = key;
        node->n++;
    } else {
        while (i >= 0 && node->keys[i] > key)
            i--;

        if (node->children[i+1]->n == 2*T - 1) {
            splitChild(node, i+1, node->children[i+1]);
            if (key > node->keys[i+1])
                i++;
        }
        insertNonFull(node->children[i+1], key);
    }
}

Step 3: Insert wrapper

BTreeNode* insert(BTreeNode* root, int key) {
    if (root->n == 2*T - 1) {
        BTreeNode* newRoot = createNode(0);
        newRoot->children[0] = root;
        splitChild(newRoot, 0, root);
        insertNonFull(newRoot, key);
        return newRoot;
    } else {
        insertNonFull(root, key);
        return root;
    }
}

6️⃣ B+Tree Differences

To implement a B+Tree in C:

Internal nodes store only keys, not actual data.
Leaf nodes store actual data and point to the next leaf.
Traversal of leaves allows fast sequential access.

Leaf Node Structure Example:

typedef struct BPlusNode {
    int keys[2*T - 1];
    int data[2*T - 1];      // store actual values
    struct BPlusNode *children[2*T];
    struct BPlusNode *next; // link to next leaf
    int n;
    int leaf;
} BPlusNode;

The insertion, split, and search logic are similar, but all data moves to leaves, and internal nodes serve only as guides.

⚡ Practical Applications

Database Indexing: MySQL, PostgreSQL, Oracle.
File Systems: NTFS, FAT, and ext4.
Key-Value Stores: RocksDB, LevelDB.
Range Queries: B+Trees are preferred for fast sequential access.

🔑 Key Takeaways

B-Trees: Balanced multi-way search trees; internal and leaf nodes store data.
B+Trees: Leaf-linked trees; all data stored in leaves; ideal for sequential access.
Time Complexity: Search, Insert, Delete = O(log n).
C Implementation: Stepwise insertion, split, search, and traversal explained.
Real-World Impact: Foundation of modern database and filesystem performance.

💻 Next Steps

Implement deletion in B-Trees and B+Trees.
Add range query support for B+Trees.
Optimize memory usage for large-scale datasets.

With this, you now have a complete understanding and code reference for B-Trees and B+Trees in C. Whether you are building a database, file system, or an indexing engine, this knowledge is foundational.

DEV Community