How Databases Actually Work: From Log Files to LSM Trees (Part 1) 🚀

#architecture #backend #database #performance

We often treat databases like PostgreSQL, MySQL, or MongoDB as magic black boxes. We send a query, and data comes back. But what is actually happening on the disk?

If you've ever wondered why some databases are "fast for writes" while others are "fast for reads," the answer lies in the Storage Engine.

In this post, we’re going to build a database from scratch and evolve it into a production-grade LSM Tree.

1. The World’s Simplest Database
2. Adding an Index (The Hash Map)
3. Solving the Space Crisis: Compaction
4. The Power of SSTables and LSM Trees

1. The World’s Simplest Database

The simplest way to store data is to just append it to a text file. No complex schemas, just raw speed.

# Simple Key-Value Store in Bash
db_set() {
  echo "$1,$2" >> database.db
}

db_get() {
  # We use 'tail -n 1' to get the most recent update for that key
  grep "^$1," database.db | sed "s/^$1,//" | tail -n 1
}

The Verdict:

Writes: O(1) — Extremely fast. You just append to the end of the file.
Reads: O(n) — Terrible. To find one key, you have to scan the entire file from start to finish.

2. Adding an Index (The Hash Map)

To fix the $O(n)$ read problem, we use a Hash Index. Think of this as a "Table of Contents" kept in your server's RAM.

# Conceptual In-Memory Index
index = {
  "user_123": 0,    # Byte offset in the file
  "user_456": 64,
  "user_789": 128
}

def get_data(key):
    offset = index[key]
    file.seek(offset)
    return file.read_line()

This is how Bitcask (the default storage engine for Riak) works. It’s incredibly fast, but there’s a catch: All your keys must fit in RAM. If you have billions of keys, your server will crash.

3. Solving the Space Crisis: Compaction

Since we only append to our log, the file grows forever even if we're just updating the same key. Databases solve this via Compaction.

We break the log into segments. Once a segment reaches a certain size, we close it and start a new one. A background process then merges these segments, throwing away old, overwritten values.

4. The Power of SSTables and LSM Trees

What if we store our data files sorted by key? This is a Sorted String Table (SSTable). Sorting allows us to merge segments efficiently (like Merge Sort) and perform range queries.

The Modern Architecture:

Memtable: All writes go to a balanced tree in memory (AVL or Red-Black Tree).
SSTable: When the Memtable gets too big, we flush it to disk as a sorted file.
WAL (Write-Ahead Log): Before writing to the Memtable, we append the operation to a "crash-recovery" log on disk.

class StorageEngine:
    def write(self, key, value):
        # 1. Append to WAL (for crash recovery)
        self.wal.append(key, value)

        # 2. Add to Memtable
        self.memtable.add(key, value)

        if self.memtable.is_full():
            self.memtable.flush_to_sstable_on_disk()

LSM Trees (Log-Structured Merge Trees) are used by Cassandra, RocksDB, and LevelDB. They are the kings of write-heavy workloads!

💬 Let's Discuss!

Have you ever run into a situation where your database writes were lagging? Did you realize your storage engine might be the bottleneck?

Question for the comments: If you were building a high-frequency trading app with millions of updates per second, would you choose an LSM-based store or a traditional RDBMS? Why?