DEV Community

Erich
Erich

Posted on

Building a database to understand databases

Databases always felt like a black box to me. You call INSERT, data goes in. You call SELECT, data comes back out. Something crashes, and somehow your data is still there. I wanted to know how all of that actually works.

InterchangeDB is a database I'm building from scratch in Rust to learn how each subsystem works by implementing it myself. The project plan has been heavily inspired by CMU BusTub, mini-lsm, and ToyDB. The internals are interchangeable. Different components can be swapped in and out so I can see how they compare directly, running the same stress tests against different combinations of components on the same data.

Right now there are two storage engines behind a generic trait.

The B+Tree sits on top of a buffer pool manager that handles reading and writing pages to disk. The buffer pool has six cache eviction policies (FIFO, Clock, LRU, LRU-K, 2Q, and ARC) that can be hot-swapped at runtime. I benchmarked all six across five different workload patterns and the results were not what I expected. More on that soon.

The LSM-Tree writes go to a memtable first, then flush to sorted string tables on disk. Bloom filters cut down unnecessary reads. I ran head-to-head benchmarks between the two engines on identical workloads. The write performance gap was orders of magnitude larger than I anticipated, and the read gap was surprisingly small. More on that soon too.

Both engines are swappable at compile time through Rust generics. Same test suite, same benchmarks, same data, different engine underneath.

Underneath the engines there's a write-ahead log with checkpointing, crash recovery, a lock manager, deadlock detection, and strict two-phase locking. The database is ACID today for single-version concurrency.

The next step is MVCC so readers never block writers. After that, garbage collection for old versions, and a verification phase of crash recovery and concurrency stress tests. The end goal is a working database where I know the ins and outs of every subsystem, what real databases use which components, and why.

Check out the project here: InterchangeDB


I'm currently looking for roles in databases, data infrastructure, and search. If your team is building in this space, I'd love to talk.

Top comments (0)