Farhad Rahimi Klie

Posted on Jun 4

Data Structures Used in Database Development: The Foundations Behind Every Database Engine

#devops #development #algorithms #coding

When people think about databases, they usually think about SQL, queries, tables, indexes, and transactions.

But underneath all of those features lies something even more fundamental:

Data Structures.

Every database engine—from PostgreSQL and MySQL to MongoDB and SQLite—is essentially a collection of carefully chosen data structures working together.

If you want to understand how databases really work—or build your own database engine from scratch—you need to understand the data structures that make modern databases possible.

In this article, we'll explore the most important data structures used in database development, why they exist, and where they are used.

Why Databases Need Specialized Data Structures

Imagine a database containing:

1 million rows
100 million rows
1 billion rows

A simple linear search through all records would be far too slow.

Databases must solve several critical problems:

Fast lookups
Efficient inserts
Efficient updates
Fast range queries
Minimal disk I/O
Efficient memory usage
Concurrent access by many users

Different data structures solve different parts of these problems.

There is no single "perfect" data structure.

Instead, database engines combine multiple structures together.

1. Arrays

What is an Array?

An array is a contiguous block of memory containing elements of the same type.

int numbers[5] = {10,20,30,40,50};

Arrays provide:

O(1) random access
Cache-friendly memory layout
Simple implementation

Where Databases Use Arrays

Arrays appear everywhere inside database engines:

Buffer Pools

Databases load disk pages into memory.

Buffer Pool

[Page 1]
[Page 2]
[Page 3]
[Page 4]

The buffer pool is often managed using arrays.

Query Execution

Intermediate query results frequently use arrays.

Example:

SELECT * FROM users;

Rows returned by the executor may temporarily be stored in array-like structures.

Columnar Databases

Column stores often keep values in arrays:

Age Column

20
25
30
40
50

This layout enables efficient analytics.

2. Linked Lists

What is a Linked List?

A linked list stores elements connected through pointers.

[Node] -> [Node] -> [Node]

Each node contains:

struct Node {
    int value;
    struct Node* next;
};

Where Databases Use Linked Lists

Buffer Management

Many databases maintain free pages using linked lists.

Free Page List

Page 5 -> Page 8 -> Page 10

Transaction Management

Active transactions are often tracked through linked structures.

Memory Allocators

Database memory managers frequently organize free blocks using linked lists.

3. Hash Tables

What is a Hash Table?

A hash table maps:

Key -> Value

through a hash function.

Example:

"Ali" -> 102
"Sara" -> 211

Why Databases Love Hash Tables

Hash tables provide nearly:

O(1)

lookup time.

This makes them ideal for exact-match searches.

Where Hash Tables Are Used

Query Execution

Hash Join:

SELECT *
FROM orders
JOIN customers
ON orders.customer_id = customers.id;

Database builds a hash table for one side of the join.

Query Cache

Many engines use hash tables for:

Query -> Result

mapping.

Metadata Lookup

Finding:

tables
indexes
columns
users

is often done through hash structures.

Limitation

Hash tables cannot efficiently perform:

WHERE age BETWEEN 20 AND 30

Range queries require ordered structures.

That's why databases also use trees.

4. Binary Search Trees (BST)

A BST stores values in sorted order.

        50
       /  \
     25    75

Properties:

Left < Root < Right

Why Databases Rarely Use Plain BSTs

A normal BST can become:

Performance degrades to:

O(n)

Therefore databases prefer balanced trees.

5. AVL Trees

AVL Trees automatically balance themselves.

      30
     /  \
   20   40

Height remains controlled.

Database Usage

AVL trees are sometimes used in:

in-memory indexes
metadata structures
cache management

However, they are less common for large disk-based indexes.

6. Red-Black Trees

Red-Black Trees are balanced binary search trees.

Operations:

Search  O(log n)
Insert  O(log n)
Delete  O(log n)

Where Used

Many database internals use Red-Black Trees for:

lock management
transaction tracking
memory management
scheduling structures

7. B-Trees

This is where databases become interesting.

B-Trees are arguably the most important data structure in database history.

The Problem

Disk access is expensive.

Reading one node per disk access would be too slow.

The Solution

Store many keys in one node.

|10|20|30|40|

Each node contains multiple keys.

This dramatically reduces tree height.

Example

Instead of:

Height = 20

you might get:

Height = 3

for millions of records.

Why Databases Use B-Trees

Advantages:

Fast lookup
Fast insertion
Fast deletion
Excellent range queries
Minimal disk reads

Used In

Traditional database indexes:

CREATE INDEX idx_users_name
ON users(name);

Often becomes a B-Tree index internally.

8. B+ Trees

Most modern databases actually use B+ Trees.

Not plain B-Trees.

Structure

Internal nodes:

Only keys

Leaf nodes:

Actual records

Leaf Nodes Are Linked

[Leaf1] <-> [Leaf2] <-> [Leaf3]

This is extremely important.

Why?

Range queries become very fast.

Example:

SELECT *
FROM users
WHERE age BETWEEN 20 AND 40;

Database finds the first record and then scans linked leaf pages.

Used By

MySQL InnoDB
SQLite
Many relational databases

B+ Trees are arguably the king of database indexing.

9. LSM Trees (Log-Structured Merge Trees)

Modern databases face a different challenge:

Massive write workloads.

Problem

Updating a B+ Tree repeatedly causes random disk writes.

Random writes are expensive.

Solution

LSM Trees.

Writes first go to memory.

MemTable

Later they are flushed to disk.

SSTables

Background processes merge files.

Benefits

Excellent write performance.

Ideal for:

Logging systems
Analytics systems
Distributed databases

Used By

10. Skip Lists

A Skip List is a probabilistic alternative to balanced trees.

Level 3 ------
Level 2 ------
Level 1 ------
Level 0 ------

Multiple layers accelerate searching.

Advantages

Simpler implementation
Good performance
Easy concurrent operations

Used By

Redis sorted sets
Some in-memory databases

11. Tries (Prefix Trees)

A Trie stores strings character by character.

Example:

car
cat
can

Shared prefixes are reused.

Database Usage

Useful for:

Full-text search
Autocomplete
Dictionary lookups
Query suggestions

Example

c
|
a
|
+--r
+--t
+--n

12. Heaps (Priority Queues)

A Heap allows quick access to:

Smallest
or
Largest

element.

Database Usage

Query Optimization

Database planners choose execution strategies using priority structures.

Top-K Queries

ORDER BY score DESC
LIMIT 10;

Heaps help efficiently maintain the top results.

13. Bloom Filters

A Bloom Filter is a probabilistic structure.

It answers:

"Possibly exists"
or
"Definitely does not exist"

Why Useful

Avoids unnecessary disk reads.

Used By

Cassandra
RocksDB
BigTable-inspired systems

Benefit

A tiny memory footprint can save huge amounts of I/O.

14. Graph Structures

Graph databases use nodes and edges.

Alice ---- Friend ---- Bob

Used By

Perfect For

Social networks
Recommendation systems
Fraud detection
Knowledge graphs

Importance Ranking for Database Engine Developers

If your goal is to build a database engine from scratch, focus on these first:

Data Structure	Importance
Arrays	⭐⭐⭐⭐⭐
Linked Lists	⭐⭐⭐⭐
Hash Tables	⭐⭐⭐⭐⭐
Binary Search Trees	⭐⭐⭐
AVL Trees	⭐⭐⭐
Red-Black Trees	⭐⭐⭐⭐
B-Trees	⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
B+ Trees	⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
LSM Trees	⭐⭐⭐⭐⭐⭐⭐
Skip Lists	⭐⭐⭐⭐⭐
Tries	⭐⭐⭐⭐
Heaps	⭐⭐⭐
Bloom Filters	⭐⭐⭐⭐⭐
Graph Structures	Depends on DB type

Final Thoughts

Database engines are not magical systems. At their core, they are sophisticated combinations of data structures designed to optimize storage, retrieval, indexing, caching, concurrency, and disk access.

If you want to become a database engineer, your learning path should look roughly like this:

Arrays
Linked Lists
Stacks & Queues
Hash Tables
Trees
AVL Trees
Red-Black Trees
B-Trees
B+ Trees
Heaps
Tries
Bloom Filters
Skip Lists
LSM Trees

Master these structures, and you'll begin to see database engines not as black boxes, but as collections of elegant algorithms and data structures working together to manage data at scale.