DEV Community

Zaki-goumri
Zaki-goumri

Posted on

Some indexing data structures

When working with databases, we often think about queries, schemas, and APIs. But underneath all of that lies a more fundamental concern:
How does a database find data efficiently?

Indexes are the data structures that answer this question. They allow databases to locate records quickly without scanning entire files on disk.

Designing indexes for disk-based storage is very different from designing in-memory data structures. Disk access is slow, memory is limited, crashes must be handled safely, and data is constantly being updated. These constraints have led to specialized indexing structures that look very different from the data structures we use in application code.

In this article, we’ll explore several core index data structures used in modern storage engines:

  • Hash indexes, the simplest form of key-based indexing

  • Sorted String Tables (SSTables), which store data in sorted order on disk

  • Log-Structured Merge Trees (LSM-Trees), which organize and merge SSTables efficiently at scale

  • B-Trees , the classic indexing structure behind many databases

Rather than focusing on formal definitions, the goal is to build an intuitive understanding of how these structures work, why they exist, and what trade offs they introduce. By the end, you should be able to reason about why different databases choose different index designs.

Let’s begin with the simplest building block: the hash index.

Hash Indexes

When we start exploring database indexes, the simplest and most intuitive structure is the hash index. Hash indexes are based on the same concept as in-memory hash maps or dictionaries in programming languages: a key maps directly to a value.

In a database, the goal is the same: quickly find the location of a record on disk using its key.


How a Hash Index Works

Imagine a database that only appends data to a log file on disk. Each record consists of a key-value pair. To find data efficiently, the database maintains an in-memory hash map that stores the mapping:

Key → Byte offset in the log file
Enter fullscreen mode Exit fullscreen mode

Whenever a new record is written:

  1. Append it to the log file.

  2. Update the in-memory hash map to point to the new offset.

When reading a value:

  1. Look up the key in the hash map.

  2. Seek to the offset in the log file.

  3. Read the value.

Hash Index Concept – key to offset mapping


Segment Files and Compaction

If the database only appended to a single log file, it would grow indefinitely. To solve this:

  • Logs are split into segments (files of fixed size).

  • Old segments are immutable.

  • A background compaction process merges multiple segments, keeping only the latest value for each key.

This ensures:

  • Disk space is reclaimed.

  • Reads remain efficient.

- Old data is cleaned up without affecting ongoing writes.

Handling Deletes (Tombstones)

To delete a key:

  • A special record called a tombstone is appended to the log.

  • During compaction, the tombstone ensures that all previous values for that key are discarded.

This keeps deletes consistent without needing in-place updates.


Crash Safety and Recovery

Hash indexes are append-only, which simplifies crash recovery:

  • Partially written records are detected using checksums.

  • In-memory hash maps are rebuilt on restart either by scanning segments or loading a snapshot saved on disk.

This ensures the database can recover quickly even after a crash.


Concurrency

Hash indexes are easy to make thread-safe:

  • Usually, one writer thread appends to the log.

  • Multiple readers can read segments concurrently.

  • Immutable segments eliminate the need for complex locking.

    Advantages of Hash Indexes

  • Fast exact key lookups (O(1) in memory).

  • Simple design and high write throughput.

  • Easy crash recovery due to append-only files.


Limitations

  • Memory-bound: all keys must fit in RAM.

  • No ordering: cannot efficiently perform range queries.

  • Not ideal for very large datasets if the key space is huge.

These limitations motivate Sorted String Tables (SSTables), which preserve key order on disk and form the foundation for LSM-Trees, which we’ll discuss next.


SSTables (Sorted String Tables)

While hash indexes are fast for exact key lookups, they have some limitations:

  • They must fit entirely in memory.

  • They do not preserve key order, so range queries are inefficient.

  • On-disk hash maps are hard to maintain efficiently.

SSTables solve these problems by storing key-value pairs in sorted order on disk.


What is an SSTable?

An SSTable (Sorted String Table) is a file that contains a sequence of key-value pairs:

  • Sorted by key.

  • Each key appears only once per SSTable.

  • Immutable after being written to disk.

This design allows efficient merges, range queries, and reduces the need for a large in-memory index.

SSTable file layout


Constructing an SSTable

Incoming writes are unordered, so we maintain a memtable in memory:

  1. Memtable = in-memory balanced tree (e.g., red-black tree) storing key-value pairs.

  2. When the memtable exceeds a threshold (a few MB), write it to disk as an SSTable.

  3. While writing the SSTable, a new memtable handles incoming writes.


Reading from SSTables

To find a key:

  1. Check the memtable first.

  2. Check the most recent SSTable, then the next-most-recent, etc.

  3. Because SSTables are sorted, we can use sparse in-memory indexes:

  • Only store offsets for some keys.

  • Scan a few KBs in the file to find the exact key.


Merging and Compaction

SSTables are immutable, so updates and deletes generate new SSTables:

  • Old SSTables are merged and compacted in the background.

  • During merge:

    • Keep only the most recent value for each key.
    • Discard obsolete or deleted entries.
  • Result = fewer files, sequential writes, and efficient storage.


Advantages of SSTables

  • Preserve sorted order, enabling range queries.

  • Immutable, which simplifies concurrency and crash recovery.

  • Can store datasets larger than memory efficiently.

  • Background compaction keeps disk usage optimal.


Limitations

  • Reads may need to check multiple SSTables to find a key.

  • Writes generate temporary files and require compaction.

These challenges are addressed by LSM-Trees, which organize multiple SSTables into a hierarchy for high write throughput and efficient reads.

Placeholder Image:

“Transition from SSTables to LSM-Tree hierarchy”


Next, we can write the LSM-Tree section in the same style, showing how it uses SSTables, handles compaction, and integrates Bloom filters for fast non-existent key lookups.


LSM-Trees (Log-Structured Merge-Trees)

While SSTables solve the problems of hash indexes by keeping keys sorted and enabling range queries, reading a key may still require checking multiple SSTables.

Log-Structured Merge-Trees (LSM-Trees) organize SSTables into a hierarchical structure to optimize both writes and reads.


What is an LSM-Tree?

An LSM-Tree is essentially a cascade of SSTables:

  • Memtable (in-memory balanced tree) receives incoming writes.

  • When the memtable fills up, it is flushed to disk as an SSTable.

  • Older SSTables are gradually merged and compacted into larger SSTables at lower levels.

This creates multiple levels of SSTables, where:

  • Level 0 = most recent SSTables

  • Higher levels = older, merged SSTables


How Writes Work

  1. Write comes in → added to the memtable.

  2. Write is appended to a log on disk (for crash recovery).

  3. When memtable exceeds threshold → flush to a new SSTable at Level 0.

  4. Background compaction merges SSTables into higher levels.

Key Points:

  • Writes are mostly sequential, maximizing disk throughput.

  • Old SSTables are never modified; only merged into new SSTables.


How Reads Work

  1. Search memtable first (fast, in-memory).

  2. Check Level 0 SSTables, then Level 1, and so on.

  3. To reduce disk reads, LSM-Trees use Bloom filters:

  • Memory-efficient structure that checks if a key does not exist.

  • Avoids reading SSTables unnecessarily.


Compaction Strategies

Compaction ensures that:

  • Duplicate keys are merged, keeping the most recent value.

  • Deleted keys (tombstones) are purged.

  • Disk usage stays manageable.

Popular strategies:

  1. Size-tiered compaction: Merge smaller SSTables into larger ones.

  2. Leveled compaction: SSTables are organized in levels; each level contains non-overlapping key ranges.


Advantages of LSM-Trees

  • Extremely high write throughput.

  • Can handle datasets larger than memory.

  • Efficient range queries due to sorted SSTables.

  • Crash recovery is simple (append-only log + immutable SSTables).


Limitations

  • Reads may need to check multiple levels → slightly slower than in-memory hash indexes.

  • Compaction consumes CPU and I/O resources, but it can be scheduled in the background.


B-Trees

While LSM-Trees and SSTables are optimized for write-heavy workloads, B-Trees are the most common indexing structure used in relational databases and many key-value stores. They are optimized for balanced reads and writes with efficient range queries.


What is a B-Tree?

A B-Tree is a self-balancing tree where:

  • Nodes (pages) store keys and pointers to child nodes.

  • All leaf nodes are at the same depth.

  • Nodes are designed to match the size of disk pages (e.g., 4 KB) for efficient I/O.

Key Properties:

  • Keys in each node are sorted.

  • Each node has a branching factor (number of children).

  • Tree height is kept low, so reads involve few disk accesses.

B tree


How Reads Work

  1. Start at the root node.

  2. Compare the key with the keys in the node.

  3. Follow the pointer to the child node covering the key range.

  4. Repeat until reaching a leaf node, which contains the key or a pointer to its value.

  • Lookup complexity = O(log n), typically just a few disk page reads.

How Writes Work

  1. Locate the leaf node for the key.

  2. Insert the key and value.

  3. If the node exceeds its capacity:

  • Split the node into two.

  • Update the parent node with the new key and pointer.

  • Repeat recursively if necessary.

  1. Pages are updated in place, unlike LSM-Trees which append to files.

Crash Safety and Concurrency

  • Writes in B-Trees may require multiple pages to be updated.

  • To avoid corruption, databases often use a Write-Ahead Log (WAL):

    • Append changes to the log first.
    • Apply changes to the tree.
    • On crash, replay the WAL to restore consistency.
  • Concurrency is handled with latches (lightweight locks) to prevent inconsistent reads during updates.


Advantages of B-Trees

  • Efficient point lookups and range queries.

  • Low tree height → few disk reads.

  • Well-understood, widely implemented in relational databases.

  • Can handle in-place updates, avoiding temporary file creation.


Limitations

  • Random writes can be slower than sequential writes (e.g., LSM-Trees).

  • Complex concurrency control is needed for multiple writers.

  • Maintaining sorted order in-place can lead to fragmentation over time.

Comparison of Index Data Structures

Feature / Structure Hash Index SSTable LSM-Tree B-Tree
Key order Unordered Sorted Sorted per SSTable Sorted
Best use case Exact key lookup, small number of keys in memory Read-heavy workloads with mostly immutable data Write-heavy workloads, large datasets Balanced read/write, range queries
Read efficiency Very fast (single hash lookup) Moderate (sparse index + scan) Moderate to high (memtable + multiple SSTables, optimized with Bloom filters) High (O(log n), few page reads)
Write efficiency Very high (append + update hash map) Moderate (requires creating new SSTable) Very high (sequential writes to memtable/SSTables) Moderate (in-place updates, node splits)
Memory usage High (hash map in memory) Low (sparse in-memory index) Low (memtable + Bloom filters) Moderate (depends on tree nodes in memory)
Crash recovery Use snapshot of hash maps + append-only log Memtable + log for recent writes Memtable + log; SSTables immutable Write-ahead log (WAL) or copy-on-write
Range queries Poor Good Good Excellent
Concurrency Easy (single writer) Easy (append-only SSTables) Easy (append-only + compaction in background) Complex (requires locks/latches)
Disk layout Segments of key-value pairs Sorted immutable files Hierarchy of SSTables with levels Tree of fixed-size pages

Conclusion

Indexing is the backbone of efficient data retrieval in databases and storage engines. Choosing the right index depends on your workload:

  • Hash indexes are perfect for fast, exact key lookups with a small set of keys in memory.

  • SSTables provide sorted storage with efficient range queries, laying the groundwork for LSM-Trees.

  • LSM-Trees excel in write-heavy workloads and can scale to datasets much larger than memory while maintaining efficient reads.

  • B-Trees remain the go-to for relational databases, offering balanced read/write performance, excellent range queries, and strong support for concurrency.

Understanding the trade-offs of each structure helps developers and architects design systems tailored to their data access patterns.

Top comments (0)