DEV Community

Cover image for Data Structures Used in Database Development: The Foundations Behind Every Database Engine
Farhad Rahimi Klie
Farhad Rahimi Klie

Posted on

Data Structures Used in Database Development: The Foundations Behind Every Database Engine

When people think about databases, they usually think about SQL, queries, tables, indexes, and transactions.

But underneath all of those features lies something even more fundamental:

Data Structures.

Every database engine—from PostgreSQL and MySQL to MongoDB and SQLite—is essentially a collection of carefully chosen data structures working together.

If you want to understand how databases really work—or build your own database engine from scratch—you need to understand the data structures that make modern databases possible.

In this article, we'll explore the most important data structures used in database development, why they exist, and where they are used.


Why Databases Need Specialized Data Structures

Imagine a database containing:

  • 1 million rows
  • 100 million rows
  • 1 billion rows

A simple linear search through all records would be far too slow.

Databases must solve several critical problems:

  • Fast lookups
  • Efficient inserts
  • Efficient updates
  • Fast range queries
  • Minimal disk I/O
  • Efficient memory usage
  • Concurrent access by many users

Different data structures solve different parts of these problems.

There is no single "perfect" data structure.

Instead, database engines combine multiple structures together.


1. Arrays

What is an Array?

An array is a contiguous block of memory containing elements of the same type.

int numbers[5] = {10,20,30,40,50};
Enter fullscreen mode Exit fullscreen mode

Arrays provide:

  • O(1) random access
  • Cache-friendly memory layout
  • Simple implementation

Where Databases Use Arrays

Arrays appear everywhere inside database engines:

Buffer Pools

Databases load disk pages into memory.

Buffer Pool

[Page 1]
[Page 2]
[Page 3]
[Page 4]
Enter fullscreen mode Exit fullscreen mode

The buffer pool is often managed using arrays.


Query Execution

Intermediate query results frequently use arrays.

Example:

SELECT * FROM users;
Enter fullscreen mode Exit fullscreen mode

Rows returned by the executor may temporarily be stored in array-like structures.


Columnar Databases

Column stores often keep values in arrays:

Age Column

20
25
30
40
50
Enter fullscreen mode Exit fullscreen mode

This layout enables efficient analytics.


2. Linked Lists

What is a Linked List?

A linked list stores elements connected through pointers.

[Node] -> [Node] -> [Node]
Enter fullscreen mode Exit fullscreen mode

Each node contains:

struct Node {
    int value;
    struct Node* next;
};
Enter fullscreen mode Exit fullscreen mode

Where Databases Use Linked Lists

Buffer Management

Many databases maintain free pages using linked lists.

Free Page List

Page 5 -> Page 8 -> Page 10
Enter fullscreen mode Exit fullscreen mode

Transaction Management

Active transactions are often tracked through linked structures.


Memory Allocators

Database memory managers frequently organize free blocks using linked lists.


3. Hash Tables

What is a Hash Table?

A hash table maps:

Key -> Value
Enter fullscreen mode Exit fullscreen mode

through a hash function.

Example:

"Ali" -> 102
"Sara" -> 211
Enter fullscreen mode Exit fullscreen mode

Why Databases Love Hash Tables

Hash tables provide nearly:

O(1)
Enter fullscreen mode Exit fullscreen mode

lookup time.

This makes them ideal for exact-match searches.


Where Hash Tables Are Used

Query Execution

Hash Join:

SELECT *
FROM orders
JOIN customers
ON orders.customer_id = customers.id;
Enter fullscreen mode Exit fullscreen mode

Database builds a hash table for one side of the join.


Query Cache

Many engines use hash tables for:

Query -> Result
Enter fullscreen mode Exit fullscreen mode

mapping.


Metadata Lookup

Finding:

  • tables
  • indexes
  • columns
  • users

is often done through hash structures.


Limitation

Hash tables cannot efficiently perform:

WHERE age BETWEEN 20 AND 30
Enter fullscreen mode Exit fullscreen mode

Range queries require ordered structures.

That's why databases also use trees.


4. Binary Search Trees (BST)

A BST stores values in sorted order.

        50
       /  \
     25    75
Enter fullscreen mode Exit fullscreen mode

Properties:

Left < Root < Right
Enter fullscreen mode Exit fullscreen mode

Why Databases Rarely Use Plain BSTs

A normal BST can become:

1
 \
  2
   \
    3
     \
      4
Enter fullscreen mode Exit fullscreen mode

Performance degrades to:

O(n)
Enter fullscreen mode Exit fullscreen mode

Therefore databases prefer balanced trees.


5. AVL Trees

AVL Trees automatically balance themselves.

      30
     /  \
   20   40
Enter fullscreen mode Exit fullscreen mode

Height remains controlled.


Database Usage

AVL trees are sometimes used in:

  • in-memory indexes
  • metadata structures
  • cache management

However, they are less common for large disk-based indexes.


6. Red-Black Trees

Red-Black Trees are balanced binary search trees.

Operations:

Search  O(log n)
Insert  O(log n)
Delete  O(log n)
Enter fullscreen mode Exit fullscreen mode

Where Used

Many database internals use Red-Black Trees for:

  • lock management
  • transaction tracking
  • memory management
  • scheduling structures

7. B-Trees

This is where databases become interesting.

B-Trees are arguably the most important data structure in database history.


The Problem

Disk access is expensive.

Reading one node per disk access would be too slow.


The Solution

Store many keys in one node.

|10|20|30|40|
Enter fullscreen mode Exit fullscreen mode

Each node contains multiple keys.

This dramatically reduces tree height.


Example

Instead of:

Height = 20
Enter fullscreen mode Exit fullscreen mode

you might get:

Height = 3
Enter fullscreen mode Exit fullscreen mode

for millions of records.


Why Databases Use B-Trees

Advantages:

  • Fast lookup
  • Fast insertion
  • Fast deletion
  • Excellent range queries
  • Minimal disk reads

Used In

Traditional database indexes:

CREATE INDEX idx_users_name
ON users(name);
Enter fullscreen mode Exit fullscreen mode

Often becomes a B-Tree index internally.


8. B+ Trees

Most modern databases actually use B+ Trees.

Not plain B-Trees.


Structure

Internal nodes:

Only keys
Enter fullscreen mode Exit fullscreen mode

Leaf nodes:

Actual records
Enter fullscreen mode Exit fullscreen mode

Leaf Nodes Are Linked

[Leaf1] <-> [Leaf2] <-> [Leaf3]
Enter fullscreen mode Exit fullscreen mode

This is extremely important.


Why?

Range queries become very fast.

Example:

SELECT *
FROM users
WHERE age BETWEEN 20 AND 40;
Enter fullscreen mode Exit fullscreen mode

Database finds the first record and then scans linked leaf pages.


Used By

B+ Trees are arguably the king of database indexing.


9. LSM Trees (Log-Structured Merge Trees)

Modern databases face a different challenge:

Massive write workloads.


Problem

Updating a B+ Tree repeatedly causes random disk writes.

Random writes are expensive.


Solution

LSM Trees.

Writes first go to memory.

MemTable
Enter fullscreen mode Exit fullscreen mode

Later they are flushed to disk.

SSTables
Enter fullscreen mode Exit fullscreen mode

Background processes merge files.


Benefits

Excellent write performance.

Ideal for:

  • Logging systems
  • Analytics systems
  • Distributed databases

Used By


10. Skip Lists

A Skip List is a probabilistic alternative to balanced trees.

Level 3 ------
Level 2 ------
Level 1 ------
Level 0 ------
Enter fullscreen mode Exit fullscreen mode

Multiple layers accelerate searching.


Advantages

  • Simpler implementation
  • Good performance
  • Easy concurrent operations

Used By

  • Redis sorted sets
  • Some in-memory databases

11. Tries (Prefix Trees)

A Trie stores strings character by character.

Example:

car
cat
can
Enter fullscreen mode Exit fullscreen mode

Shared prefixes are reused.


Database Usage

Useful for:

  • Full-text search
  • Autocomplete
  • Dictionary lookups
  • Query suggestions

Example

c
|
a
|
+--r
+--t
+--n
Enter fullscreen mode Exit fullscreen mode

12. Heaps (Priority Queues)

A Heap allows quick access to:

Smallest
or
Largest
Enter fullscreen mode Exit fullscreen mode

element.


Database Usage

Query Optimization

Database planners choose execution strategies using priority structures.

Top-K Queries

ORDER BY score DESC
LIMIT 10;
Enter fullscreen mode Exit fullscreen mode

Heaps help efficiently maintain the top results.


13. Bloom Filters

A Bloom Filter is a probabilistic structure.

It answers:

"Possibly exists"
or
"Definitely does not exist"
Enter fullscreen mode Exit fullscreen mode

Why Useful

Avoids unnecessary disk reads.


Used By

  • Cassandra
  • RocksDB
  • BigTable-inspired systems

Benefit

A tiny memory footprint can save huge amounts of I/O.


14. Graph Structures

Graph databases use nodes and edges.

Alice ---- Friend ---- Bob
Enter fullscreen mode Exit fullscreen mode

Used By


Perfect For

  • Social networks
  • Recommendation systems
  • Fraud detection
  • Knowledge graphs

Importance Ranking for Database Engine Developers

If your goal is to build a database engine from scratch, focus on these first:

Data Structure Importance
Arrays ⭐⭐⭐⭐⭐
Linked Lists ⭐⭐⭐⭐
Hash Tables ⭐⭐⭐⭐⭐
Binary Search Trees ⭐⭐⭐
AVL Trees ⭐⭐⭐
Red-Black Trees ⭐⭐⭐⭐
B-Trees ⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
B+ Trees ⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
LSM Trees ⭐⭐⭐⭐⭐⭐⭐
Skip Lists ⭐⭐⭐⭐⭐
Tries ⭐⭐⭐⭐
Heaps ⭐⭐⭐
Bloom Filters ⭐⭐⭐⭐⭐
Graph Structures Depends on DB type

Final Thoughts

Database engines are not magical systems. At their core, they are sophisticated combinations of data structures designed to optimize storage, retrieval, indexing, caching, concurrency, and disk access.

If you want to become a database engineer, your learning path should look roughly like this:

  1. Arrays
  2. Linked Lists
  3. Stacks & Queues
  4. Hash Tables
  5. Trees
  6. AVL Trees
  7. Red-Black Trees
  8. B-Trees
  9. B+ Trees
  10. Heaps
  11. Tries
  12. Bloom Filters
  13. Skip Lists
  14. LSM Trees

Master these structures, and you'll begin to see database engines not as black boxes, but as collections of elegant algorithms and data structures working together to manage data at scale.

Top comments (0)