<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: geek-electro</title>
    <description>The latest articles on DEV Community by geek-electro (@electrogeek).</description>
    <link>https://dev.to/electrogeek</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F855472%2F9df87a43-a20b-4d72-bea1-1ce7ea3a80a8.jpeg</url>
      <title>DEV Community: geek-electro</title>
      <link>https://dev.to/electrogeek</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/electrogeek"/>
    <language>en</language>
    <item>
      <title>I built a storage engine from scratch. Here’s everything I learned.</title>
      <dc:creator>geek-electro</dc:creator>
      <pubDate>Wed, 20 May 2026 05:57:13 +0000</pubDate>
      <link>https://dev.to/electrogeek/i-built-a-storage-engine-from-scratch-heres-everything-i-learned-3n9b</link>
      <guid>https://dev.to/electrogeek/i-built-a-storage-engine-from-scratch-heres-everything-i-learned-3n9b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcb3a1b79jdq2aiok63a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcb3a1b79jdq2aiok63a.png" alt=" " width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  I built a storage engine from scratch. Here's everything I learned.
&lt;/h1&gt;

&lt;p&gt;Not a wrapper. Not a library call. A real, working storage engine — written in C++, exposed over gRPC, running inside Docker. This is the story of how I built it and why.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem I was trying to solve
&lt;/h2&gt;

&lt;p&gt;I needed to store structured data in a very specific shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;document&lt;/strong&gt; (HTML, text, anything)&lt;/li&gt;
&lt;li&gt;A chain of &lt;strong&gt;inputs&lt;/strong&gt; attached to that document&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;expected output&lt;/strong&gt; for each input&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Simple enough, right? But every database I looked at felt wrong.&lt;/p&gt;

&lt;p&gt;A relational database wanted me to define a schema upfront and write JOIN queries just to read a document with its inputs. A key-value store was too flat — I'd have to model the relationships myself. A document database handled the top level fine but got awkward when I needed an ordered, linked chain of sub-documents.&lt;/p&gt;

&lt;p&gt;So I built exactly what I needed. Nothing more.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Stratum?
&lt;/h2&gt;

&lt;p&gt;Stratum is a &lt;strong&gt;log-structured, hierarchical storage engine&lt;/strong&gt; with O(1) lookup, built in C++ and exposed over gRPC so any language can talk to it.&lt;/p&gt;

&lt;p&gt;Here's the data model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document ──→ Input node 1 ──→ Input node 2 ──→ Input node 3
                  │                 │                 │
                  ▼                 ▼                 ▼
             Output node 1    Output node 2    Output node 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every document lives at a known byte offset on disk. An in-memory hash map holds &lt;code&gt;document_id → byte_offset&lt;/code&gt;. Every read is one hash map lookup (O(1)) plus one disk seek. No indexes to maintain, no query planner, no surprises.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it actually works under the hood
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Log-structured writes
&lt;/h3&gt;

&lt;p&gt;Every write — insert, update, delete — is an &lt;strong&gt;append&lt;/strong&gt; to the end of the active segment file. Nothing is ever mutated in place.&lt;/p&gt;

&lt;p&gt;When you update a document, the old version stays on disk. The new version is appended. The in-memory index is updated to point at the new offset. The old version becomes garbage.&lt;/p&gt;

&lt;p&gt;This means writes are always O(1) and you never get partial writes corrupting your data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Segment rotation
&lt;/h3&gt;

&lt;p&gt;When the active segment grows beyond a configured threshold, it gets sealed — renamed to &lt;code&gt;seg_NNN.seg&lt;/code&gt; — and a fresh &lt;code&gt;active.seg&lt;/code&gt; is opened. You end up with a stack of segments on disk, which is where the name comes from. Like geological strata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background compaction
&lt;/h3&gt;

&lt;p&gt;A background C++ thread wakes periodically and checks total segment size. When it crosses a threshold, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scans all sealed segments&lt;/li&gt;
&lt;li&gt;For every record ID, keeps only the version with the highest timestamp&lt;/li&gt;
&lt;li&gt;Discards tombstoned (deleted) records entirely&lt;/li&gt;
&lt;li&gt;Writes a single &lt;code&gt;merged.seg&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Atomically replaces the old segments&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Storage only grows proportionally to live data — not to total write history. This is the same strategy used by Bitcask and LSM-tree databases like RocksDB, just much simpler.&lt;/p&gt;

&lt;h3&gt;
  
  
  Thread safety
&lt;/h3&gt;

&lt;p&gt;Reads use a &lt;code&gt;std::shared_mutex&lt;/code&gt; — any number of readers can run concurrently. Writes take an exclusive lock only to update the in-memory index. The disk append itself is serialized by a per-segment mutex.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Python/Go/Node app
        │
        │  gRPC (auto-generated client from a single .proto file)
        ▼
  Stratum server (C++, always running)
        │
        ├── In-memory hash index  ← O(1) lookups
        ├── Segment manager       ← append-only log files
        └── Compactor thread      ← background GC
        │
        ▼
  Disk (log-structured .seg files)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gRPC part is what makes this useful beyond a single project. You write the &lt;code&gt;.proto&lt;/code&gt; file once, run &lt;code&gt;protoc&lt;/code&gt;, and get a client in Python, Go, Java, Node, Rust — any language gRPC supports. The C++ engine doesn't care who's calling it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The data types problem
&lt;/h2&gt;

&lt;p&gt;One thing I underestimated early on: input and output nodes can hold &lt;em&gt;anything&lt;/em&gt;. An integer. A string. A list of integers. A list of strings. A map.&lt;/p&gt;

&lt;p&gt;I ended up using &lt;code&gt;std::variant&lt;/code&gt; in C++ to represent this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;ValueVariant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;
    &lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the wire (gRPC), this becomes a &lt;code&gt;oneof&lt;/code&gt; in the proto definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight protobuf"&gt;&lt;code&gt;&lt;span class="kd"&gt;message&lt;/span&gt; &lt;span class="nc"&gt;Value&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;oneof&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;int64&lt;/span&gt;      &lt;span class="na"&gt;int_val&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kt"&gt;double&lt;/span&gt;     &lt;span class="na"&gt;double_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kt"&gt;string&lt;/span&gt;     &lt;span class="na"&gt;str_val&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;Int64List&lt;/span&gt;  &lt;span class="na"&gt;vec_int&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;StringList&lt;/span&gt; &lt;span class="na"&gt;vec_str&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;StringMap&lt;/span&gt;  &lt;span class="na"&gt;map_val&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Python client wraps this so you never think about it — you just pass a Python int, list, or dict and the client figures out the right proto type.&lt;/p&gt;




&lt;h2&gt;
  
  
  What calling it from Python looks like
&lt;/h2&gt;

&lt;p&gt;Once the Docker container is running, this is the entire client-side API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;lse_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LseClient&lt;/span&gt;

&lt;span class="n"&gt;lse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LseClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost:50051&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create a document
&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_problem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;h1&amp;gt;My document&amp;lt;/h1&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tutorial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add input nodes — any data type
&lt;/span&gt;&lt;span class="n"&gt;node1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_test_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;node2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_test_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;some string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;node3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_test_case&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Attach output nodes
&lt;/span&gt;&lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_expected_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Read everything back
&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_problem_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_test_cases&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;out&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;lse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_expected_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No HTTP, no JSON parsing, no URL construction. Just function calls. The generated gRPC stub handles serialization, connection management, and error handling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running it
&lt;/h2&gt;

&lt;p&gt;The engine ships as a Docker container. Starting it is one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The server starts on port 50051. Data is written to a named Docker volume so it survives container restarts and rebuilds. To stop: &lt;code&gt;docker-compose down&lt;/code&gt;. Data stays. To wipe everything: &lt;code&gt;docker-compose down -v&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned building this
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Append-only is underrated.&lt;/strong&gt; The moment I stopped trying to update records in place, the entire concurrency problem got simpler. Readers and writers can't conflict on the same byte offset because writers never touch old offsets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Separate the index from the data.&lt;/strong&gt; The in-memory hash index is tiny — just &lt;code&gt;id → offset&lt;/code&gt;. It loads from disk on startup in milliseconds. The actual data — potentially gigabytes of blobs — never touches RAM until you ask for it. This is the core insight behind Bitcask.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. gRPC over a custom protocol every time.&lt;/strong&gt; I briefly considered rolling a custom TCP protocol. The moment I saw what a &lt;code&gt;.proto&lt;/code&gt; file + &lt;code&gt;protoc&lt;/code&gt; gives you — a typed, versioned, language-agnostic API with generated clients in every language — there was no reason to do anything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The compactor is the hardest part.&lt;/strong&gt; Not because the algorithm is complex, but because it has to be correct under concurrent access. Getting the rename-then-reload sequence right — so readers never see a half-compacted state — took more iteration than anything else in the codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. CRC on every record.&lt;/strong&gt; I added CRC-32 checksums on every record header from day one. It caught two bugs during development that would have been nearly impossible to find otherwise. Storage engines fail silently at the byte level. Checksums surface those failures immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Stratum is open source. You can use it for anything that fits the document → inputs → outputs model. I'm actively working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A write-ahead log (WAL) for crash recovery&lt;/li&gt;
&lt;li&gt;Snapshot / backup support&lt;/li&gt;
&lt;li&gt;Benchmarks against SQLite for this specific access pattern&lt;/li&gt;
&lt;li&gt;A Rust client&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you build something with it, I'd genuinely love to know.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/electro-geek/Stratum" rel="noopener noreferrer"&gt;https://github.com/electro-geek/Stratum&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;.proto&lt;/code&gt; file (the full API contract): &lt;code&gt;proto/lse.proto&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Quick start: &lt;code&gt;docker pull electrogeek/stratum-server:latest
&lt;/code&gt; then &lt;code&gt;pip install grpcio grpcio-tools&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>systemdesign</category>
      <category>backenddevelopment</category>
      <category>database</category>
    </item>
  </channel>
</rss>
