DEV Community

Cover image for Building a Toy SSTable Storage Engine in Python
Slavik
Slavik

Posted on

Building a Toy SSTable Storage Engine in Python

Have you ever wondered how modern databases like LevelDB, RocksDB, or Cassandra store and retrieve massive amounts of data efficiently? The secret sauce is often a data structure called the Log-Structured Merge-Tree (LSM-Tree) and its core component, the Sorted String Table (SSTable).

In this post, we’ll build a toy, educational SSTable-based storage engine in Python, inspired by Martin Kleppmann’s Designing Data-Intensive Applications. We’ll start simple and gradually add complexity, so you can follow along even if you’re new to storage internals!


What Are LSM-Trees and SSTables?

LSM-Tree stands for Log-Structured Merge-Tree. It’s a data structure designed to make writing data to disk very fast and efficient.

An SSTable is a file format for storing large, sorted key-value pairs on disk. The key properties are:

  • Sorted: All keys are stored in order, making range queries and binary search possible.
  • Immutable: Once written, SSTables are never modified. New data is written to new files.
  • Efficient: By combining in-memory and on-disk structures, SSTables enable fast writes and reasonably fast reads.

SSTables are the building blocks behind many modern, high-performance databases like LevelDB, RocksDB, and Cassandra.

How Does It Work?

  1. Writes go to memory first: When you add or update data, it’s first stored in a fast, in-memory structure (called a memtable).
  2. Flush to disk as SSTable: When the memtable gets full, all its data is written to disk as a new SSTable file. These files are never changed after being written.
  3. Reads check memory and disk: When you read data, the system first checks the memtable, then searches through the SSTables on disk.

How Is This Different from Other Approaches?

  • Traditional databases (like those using B-Trees) update data in place on disk. This means lots of small, random writes, which can be slow on hard drives and even SSDs.
  • LSM-Trees always write new data in large, sequential chunks, which is much faster for disks.
  • SSTables are immutable, so there’s no need to lock files for writing, and old data can be cleaned up later in the background (a process called compaction).

Why Use LSM-Trees and SSTables?

  • Fast writes: Great for applications that need to handle lots of inserts and updates quickly (like logs, metrics, or time series data).
  • Efficient storage: Sequential disk writes are much faster and less likely to wear out SSDs.
  • Scalability: LSM-Trees can handle huge amounts of data by merging and compacting SSTables in the background.

Project Structure

Here’s what we’ll build (and you can find the full code on GitHub):

  • memtable.py: An in-memory, sorted key-value store.
  • sstable_writer.py: Writes sorted key-value pairs to disk as an SSTable, with a sparse index and Bloom filter.
  • sstable_reader.py: Reads from SSTables using the index and Bloom filter.
  • sstable.py: Orchestrates the LSM-Tree logic, combining memtable and SSTables.
  • simple_bloom_filter.py: A simple Bloom filter for fast negative lookups.
  • sstable_server.py: A UNIX socket server exposing set/get operations.
  • main.py: A CLI client to interact with the server.
  • stress_test.py: A script to stress test the system.

Step 1: The Memtable – Fast In-Memory Writes

When you write data, it first lands in the memtable—a sorted, in-memory structure. In our Python version, we use a sorted list and the bisect module for efficient lookups and inserts.

# memtable.py
class Memtable:
    def __init__(self):
        self._data = []

    def set(self, key, value):
        # Insert or update, keeping the list sorted
        ...

    def get(self, key):
        # Binary search for fast lookup
        ...
Enter fullscreen mode Exit fullscreen mode

When the memtable gets too big, we flush it to disk as a new SSTable.


Step 2: Writing SSTables – Persistence and Order

Flushing the memtable means writing all its sorted key-value pairs to a file. But how do we make reads efficient?

  • Sparse Index: Every Nth key and its file offset are written to an index file. This lets us quickly jump to the right part of the SSTable.
  • Bloom Filter: A probabilistic data structure that tells us if a key is definitely not in the file, saving unnecessary disk reads.
# sstable_writer.py
class SSTableWriter:
    def write(self, sorted_kv_pairs):
        # Write data lines and build sparse index
        # Serialize and store the Bloom filter
        ...
Enter fullscreen mode Exit fullscreen mode

Step 3: Reading SSTables – Fast Lookups

When you want to read a key:

  1. Check the memtable (fastest).
  2. Check the Bloom filter for each SSTable (quickly skip files that don’t have the key).
  3. Use the sparse index to jump to the right spot in the SSTable file and scan for the key.
# sstable_reader.py
class SSTableReader:
    def get(self, key):
        # Use Bloom filter and sparse index to minimize disk I/O
        ...
Enter fullscreen mode Exit fullscreen mode

Step 4: The LSM-Tree – Orchestrating Everything

The SSTable class manages the memtable, SSTable files, and the index cache. It handles:

  • Set: Write to memtable, flush to SSTable when full.
  • Get: Check memtable, then SSTables from newest to oldest.
# sstable.py
class SSTable:
    def set(self, key, value):
        ...
    def get(self, key):
        ...
    def flush(self):
        ...
Enter fullscreen mode Exit fullscreen mode

Step 5: Server and CLI – Putting It All Together

We expose our storage engine via a simple UNIX socket server (sstable_server.py). You can interact with it using the CLI (main.py):

python -m sstable_server   # Start the server
python -m main set mykey 123
python -m main get mykey
Enter fullscreen mode Exit fullscreen mode

Step 6: Stress Testing

How does it perform? The stress_test.py script:

  • Starts the server
  • Inserts 1000 random key-value pairs
  • Reads them all back and prints the sum and average
python stress_test.py
Enter fullscreen mode Exit fullscreen mode

Why Does This Matter?

  • Write-Optimized: LSM-Trees and SSTables are designed for fast, sequential writes—perfect for write-heavy workloads.
  • Efficient Reads: Sparse indexes and Bloom filters keep reads fast, even as data grows.
  • Real-World Use: These ideas power LevelDB, RocksDB, Cassandra, and more.

Next Steps

This project is a toy—but it’s a great way to learn! You can extend it by adding:

  • Compaction (merging old SSTables)
  • Range queries
  • Deletion markers (tombstones)
  • Compression

Conclusion

Building your own SSTable-based storage engine is a fantastic way to understand the internals of modern databases. By starting simple and adding complexity, you’ll gain intuition for how real-world systems handle massive data efficiently.

Check out the full code on GitHub and try it yourself!

Top comments (1)

Collapse
 
commonstory profile image
beard

This was super helpful for understanding the core ideas behind SSTables and LSM-trees, thanks for breaking it down so clearly!