<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Loïc Baumann</title>
    <description>The latest articles on DEV Community by Loïc Baumann (@nockawa).</description>
    <link>https://dev.to/nockawa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3848296%2F0d77dd3c-3f8e-4b0a-b82b-bf2ef9946225.png</url>
      <title>DEV Community: Loïc Baumann</title>
      <link>https://dev.to/nockawa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nockawa"/>
    <language>en</language>
    <item>
      <title>Building a Page Cache That Doesn't Count: Epoch-Based Memory Management</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Fri, 26 Jun 2026 15:23:52 +0000</pubDate>
      <link>https://dev.to/nockawa/building-a-page-cache-that-doesnt-count-epoch-based-memory-management-364i</link>
      <guid>https://dev.to/nockawa/building-a-page-cache-that-doesnt-count-epoch-based-memory-management-364i</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/what-game-engines-know-about-data/" rel="noopener noreferrer"&gt;What Game Engines Know About Data That Databases Forgot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/microsecond-latency-managed-language/" rel="noopener noreferrer"&gt;Microsecond Latency in a Managed Language&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/deadlock-free-by-construction/" rel="noopener noreferrer"&gt;Deadlock-Free by Construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/three-durability-modes-one-wal/" rel="noopener noreferrer"&gt;Three Durability Modes, One WAL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building a Page Cache That Doesn't Count&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;MVCC at Microsecond Scale &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; &amp;nbsp;•&amp;nbsp; 📬 &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;Subscribe via RSS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A transaction that reads a thousand pages should pay for reading a thousand pages. Mine used to pay for two thousand atomic operations on top — one to pin each page so the cache wouldn't yank it out from under a live pointer, one to unpin it afterward — before a single byte of useful work happened.&lt;/p&gt;

&lt;p&gt;That bookkeeping was pure tax. Worse, it was &lt;em&gt;contended&lt;/em&gt; tax. The page cache in Typhon today touches those same thousand pages and pays for exactly two operations, flat, regardless of the count. This post is about how the counter went away — and the one corner where I had to bring a smaller one back.&lt;/p&gt;

&lt;h2&gt;
  
  
  📍 Where we are
&lt;/h2&gt;

&lt;p&gt;A quick recap for anyone joining mid-series. Typhon stores everything — components, indexes, the lot — in 8 KB pages. Those pages are cached in a single &lt;code&gt;byte[]&lt;/code&gt; that is pinned with a &lt;code&gt;GCHandle&lt;/code&gt; so the GC can never move it. Pinning is what makes the rest of the engine possible: once the backing array is fixed in memory, the hot paths hand out &lt;strong&gt;raw pointers and &lt;code&gt;ref T&lt;/code&gt; interior references straight into the cache slot&lt;/strong&gt;. No copy, no marshaling, no managed wrapper. A read of a component is a pointer add and a dereference.&lt;/p&gt;

&lt;p&gt;That zero-copy guarantee is also the whole problem. The cache is finite — 256 pages by default — so when it fills, a clock-sweep eviction picks a victim slot and reuses it for a different file page. If a reader is still holding a &lt;code&gt;ref Position&lt;/code&gt; into that slot when the sweep recycles it, the reader is now looking at someone else's data. A use-after-free, in a managed language, with no exception to tell you it happened — just a silently wrong number.&lt;/p&gt;

&lt;p&gt;So the cache needs a lifetime rule: &lt;strong&gt;a slot must not be evicted while anyone still holds a pointer into it.&lt;/strong&gt; The interesting part is not &lt;em&gt;that&lt;/em&gt; rule. Every cache has it. The interesting part is what you pay to enforce it.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧾 The obvious answer, and its bill
&lt;/h2&gt;

&lt;p&gt;The textbook enforcement is reference counting. Every page carries a counter. You take a reference (&lt;code&gt;++&lt;/code&gt;) before you read, you drop it (&lt;code&gt;--&lt;/code&gt;) when you're done, and eviction skips any page whose count is above zero. Simple, correct, and the first thing I shipped — Typhon's early page cache had exactly this, a per-page &lt;code&gt;ConcurrentSharedCounter&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It has three costs, and they compound:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It's 2N.&lt;/strong&gt; A transaction that touches N pages does N increments and N decrements. The work scales with the data, which is the opposite of what you want from bookkeeping.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The decrement is mandatory and paired.&lt;/strong&gt; Every acquire needs its matching release on &lt;em&gt;every&lt;/em&gt; exit path, including exceptions. Miss one and the page is pinned forever — a slow leak that strands cache slots until the engine wedges. This is the half that generates bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The counter is shared, mutable, cross-core state.&lt;/strong&gt; That &lt;code&gt;++&lt;/code&gt;/&lt;code&gt;--&lt;/code&gt; is an &lt;a href="https://learn.microsoft.com/en-us/dotnet/api/system.threading.interlocked" rel="noopener noreferrer"&gt;&lt;code&gt;Interlocked&lt;/code&gt;&lt;/a&gt; operation — an atomic read-modify-write. When two worker threads touch the same hot page — a B+Tree root, say — its counter's cache line ping-pongs between their cores. The atomic instruction isn't the expensive part; the coherency traffic is. A contended counter can cost an order of magnitude more than an uncontended one, and the hottest pages are exactly the ones every thread wants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It clutters everything it touches.&lt;/strong&gt; A mandatory paired release means scaffolding at every call site — hundreds of &lt;code&gt;using&lt;/code&gt; blocks whose entire reason to exist is a &lt;code&gt;Dispose&lt;/code&gt; that decrements a counter. And the moment that overhead shows up in a profile, you start bending the code to dodge it: paths that &lt;em&gt;steal&lt;/em&gt; an already-held reference instead of taking a fresh one, so an increment here cancels a decrement there. Each of those tricks is correct only under assumptions the next reader can't see from the call site. The counter didn't just cost cycles — it spread through the codebase and made the whole engine harder to reason about. That cost is invisible in a microbenchmark and very visible six months later.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Reference counting answers a question the cache never actually asked: &lt;em&gt;"exactly how many people are looking at this page right now?"&lt;/em&gt; Eviction doesn't need that number. It needs a cheaper, weaker fact: &lt;em&gt;"is it safe to reclaim this slot yet?"&lt;/em&gt; Those are different questions, and the second one has a much cheaper answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⏱️ The reframe: stop counting, start timing
&lt;/h2&gt;

&lt;p&gt;The technique is &lt;strong&gt;epoch-based reclamation&lt;/strong&gt; (EBR) — the same family of idea behind &lt;a href="https://en.wikipedia.org/wiki/Read-copy-update" rel="noopener noreferrer"&gt;RCU&lt;/a&gt; in the Linux kernel. Instead of tracking who holds what, you track &lt;em&gt;time&lt;/em&gt;, coarsely, and you only reclaim memory once enough time has passed that nobody could still be holding the old version.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;The intuition.&lt;/strong&gt; Think of a museum that wants to renovate a gallery — but only once every visitor who might still be wandering through it has gone home. The expensive way is to tag each visitor and constantly track which gallery they're standing in; that's reference counting. The cheap way: stamp every visitor at the entrance with the hour they arrived, and have each gallery note the last hour anyone walked through it. Now the renovation crew checks a single number — the arrival time of the &lt;em&gt;earliest&lt;/em&gt; visitor still in the building. Any gallery whose last visit predates that time is guaranteed empty of anyone who could still be inside, so the crew can move in. No per-visitor ledger, just one question: who's the oldest person still here? Epoch-based reclamation &lt;em&gt;is&lt;/em&gt; that museum — the visitor's arrival stamp is the epoch a thread pins on entry, the gallery's last-walked hour is the page's stamp, and "the earliest visitor still inside" is the threshold eviction compares against.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;"Time" here is a single global counter — the &lt;strong&gt;epoch&lt;/strong&gt; — that ticks forward. The protocol has three moves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enter.&lt;/strong&gt; When a thread starts a unit of work, it publishes the current epoch into a slot of its own: &lt;em&gt;"I am active, and I started at epoch E."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stamp.&lt;/strong&gt; Every page the thread touches gets tagged with the current epoch — fire-and-forget, no matching release.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exit.&lt;/strong&gt; When the thread finishes, it clears its slot and ticks the global epoch forward by one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eviction then asks one question of a page: &lt;em&gt;is its stamp older than the oldest still-active thread?&lt;/em&gt; If every active thread started after this page was last touched, no one can be holding a pointer into it, and the slot is free to recycle. That comparison is the entire safety check.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/page-cache-epochs.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fpage-cache-epochs.png" alt="Side-by-side comparison: per-page reference counting requires 2N contended atomic operations on shared per-page counters that bounce between CPU cores; epoch-based protection requires one thread-local pin on entry plus one global epoch advance on exit — two operations total, flat, regardless of how many pages the transaction touches, with page stamps that are fire-and-forget and need no matching release" width="800" height="407"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The asymmetry is the whole win. The expensive, bug-prone, contended half of reference counting — the mandatory paired &lt;em&gt;release&lt;/em&gt; on every page — disappears entirely. Releasing is now a single global tick that retires &lt;em&gt;all&lt;/em&gt; of a transaction's pages at once. What's left on the acquire side is a stamp that is uncontended and usually a no-op.&lt;/p&gt;
&lt;h2&gt;
  
  
  ⚙️ How it actually works
&lt;/h2&gt;

&lt;p&gt;Three types carry the design (&lt;a href="https://github.com/nockawa/Typhon/tree/main/src/Typhon.Engine/Foundation/Concurrency/internals" rel="noopener noreferrer"&gt;&lt;code&gt;src/Typhon.Engine/Foundation/Concurrency/internals/&lt;/code&gt;&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;EpochManager&lt;/code&gt; — one per engine. Owns the global epoch counter and the thread registry.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EpochGuard&lt;/code&gt; — a &lt;code&gt;ref struct&lt;/code&gt; RAII scope. Enter on construction, advance the epoch on the outermost exit. Being a &lt;code&gt;ref struct&lt;/code&gt;, it can't escape to the heap or get boxed; it has to live on the stack and be disposed.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EpochThreadRegistry&lt;/code&gt; — a fixed array of per-thread slots holding each thread's pinned epoch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A transaction wraps its whole lifetime in one guard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Managed by Transaction — one scope per unit of work&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EpochGuard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epochManager&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Every page access inside this scope is protected, no per-page bookkeeping&lt;/span&gt;
&lt;span class="n"&gt;pmmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RequestPageEpoch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filePageIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GlobalEpoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;out&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;memPageIndex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pmmf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetMemPageAddress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memPageIndex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ... read/write through raw pointers, hold ref T across many pages ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Entering&lt;/strong&gt; publishes the epoch into the thread's own slot. The hot path is a single write to thread-local memory — no atomic, no contention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// EpochThreadRegistry.PinCurrentThread — outermost scope pins, nesting is free&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_slots&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;Depth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;_slots&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;Depth&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_slots&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;PinnedEpoch&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// publish: "active, started at this epoch"&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Stamping&lt;/strong&gt; a page is an atomic-max — never let a page's stamp go backward — that skips the atomic write entirely in the common case where the page is already current for this epoch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// PagedMMF.RequestPageEpoch — tag the page, never moving the stamp backward&lt;/span&gt;
&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;do&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AccessEpoch&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;currentEpoch&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                  &lt;span class="c1"&gt;// already protected this epoch — no write at all&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interlocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CompareExchange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AccessEpoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;currentEpoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;Interlocked.CompareExchange&lt;/code&gt; is a &lt;strong&gt;compare-and-swap&lt;/strong&gt; (CAS): it stores the new epoch &lt;em&gt;only if&lt;/em&gt; the field still holds the value we just read, so two threads stamping the same page can't clobber each other. The first touch of a page in an epoch does one such CAS; every subsequent touch breaks out at the &lt;code&gt;currentEpoch &amp;lt;= existing&lt;/code&gt; guard — a plain read and a compare, no atomic at all. And — this is the point — there is no release. Nothing to pair, nothing to forget, nothing to run in a &lt;code&gt;finally&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exiting&lt;/strong&gt; the outermost scope is where the single shared atomic lives — the global epoch tick:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// EpochManager.ExitScope — outermost exit retires the whole transaction's pages&lt;/span&gt;
&lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;ExitScope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;expectedDepth&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UnpinCurrentThread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expectedDepth&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;newEpoch&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Interlocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;_globalEpoch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;TyphonEvent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;EmitConcurrencyEpochAdvance&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;newEpoch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And &lt;strong&gt;eviction&lt;/strong&gt; is the comparison the whole scheme exists to make cheap. &lt;code&gt;MinActiveEpoch&lt;/code&gt; is the smallest epoch any thread is still pinned to; a page survives if its stamp is at least that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// PagedMMF.TryAcquire — the eviction guard, minus the dirty/in-flight checks&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AccessEpoch&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;minActiveEpoch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// a transaction old enough to hold a pointer is still live — skip&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Walk the invariant once and it clicks. A thread that pinned epoch E contributes E to &lt;code&gt;MinActiveEpoch&lt;/code&gt;, so &lt;code&gt;MinActiveEpoch ≤ E&lt;/code&gt;. Any page it stamped was stamped at the global epoch &lt;em&gt;at access time&lt;/em&gt;, which is ≥ E. So while that thread is live, every page it ever touched satisfies &lt;code&gt;AccessEpoch ≥ E ≥ MinActiveEpoch&lt;/code&gt; and cannot be evicted. The moment it exits — and no older thread remains — &lt;code&gt;MinActiveEpoch&lt;/code&gt; climbs past those stamps and they all become reclaimable together. This is a &lt;strong&gt;grace period&lt;/strong&gt;, exactly as in RCU: reclamation waits until every reader that could have seen the old state has gone away.&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 The numbers
&lt;/h2&gt;

&lt;p&gt;Measured with BenchmarkDotNet on a Ryzen 9 7950X (Zen 4), .NET 10, RELEASE:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Allocations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;EpochGuard&lt;/code&gt; enter + exit (incl. global tick)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Three-level nested enter/exit&lt;/td&gt;
&lt;td&gt;8.8 ns&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;MinActiveEpoch&lt;/code&gt; (eviction check, no threads pinned)&lt;/td&gt;
&lt;td&gt;0.8 ns&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;MinActiveEpoch&lt;/code&gt; while a thread is pinned&lt;/td&gt;
&lt;td&gt;6.6 ns&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page-cache hit (&lt;code&gt;RequestPageEpoch&lt;/code&gt;, slot resident)&lt;/td&gt;
&lt;td&gt;6.5 ns&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The number that matters is the first one: &lt;strong&gt;3.5 ns for a transaction's entire eviction-protection obligation, and it does not move when the page count goes up.&lt;/strong&gt; A transaction touching one page and a transaction touching ten thousand pay the same 3.5 ns of protection overhead. The per-page stamps on top are uncontended and frequently free; the contended, paired, O(N) release is gone.&lt;/p&gt;

&lt;p&gt;Put that against the model it replaced: a thousand-page transaction was a thousand &lt;code&gt;Interlocked&lt;/code&gt; increments and a thousand decrements, on a thousand separate counters, several of them hot enough to bounce across cores on every touch. Trading 2,000 contended atomics for one global tick is the kind of win you only get by answering an easier question.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚖️ Trade-offs, and the one place pure epochs weren't enough
&lt;/h2&gt;

&lt;p&gt;Epoch protection is &lt;em&gt;conservative&lt;/em&gt; — that's the source of both its speed and its cost. It doesn't know which pages a transaction is done with; it only knows the transaction is still running. So a transaction pins &lt;strong&gt;every page it has ever touched&lt;/strong&gt; for its entire lifetime, whether it still needs them or not.&lt;/p&gt;

&lt;p&gt;For short transactions that's exactly right and basically free. For a long one it bites: if a single transaction's working set grows past the cache size, every page it has stamped is epoch-protected, so eviction has no legal victim. The cache can't make room, and &lt;code&gt;AllocateMemoryPage&lt;/code&gt; runs into a backpressure wall and eventually times out. Reference counting wouldn't have this problem — it could evict a page the long transaction had touched and released. Epochs can't, because there is no "released" until the scope exits. That working-set-fits-in-cache constraint is a real, documented limit of the design, not a bug I haven't gotten to.&lt;/p&gt;

&lt;p&gt;The mitigation is to let a long-lived thread &lt;strong&gt;refresh&lt;/strong&gt; its epoch mid-flight — advance its own pin without unpinning — so pages it touched earlier age out and become evictable while it keeps running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// EpochManager.RefreshScope — stay continuously pinned, but move forward in time&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;newEpoch&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Interlocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;_globalEpoch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;_registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;RefreshPinnedEpoch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newEpoch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// no unpinned window; MinActiveEpoch never gaps&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is where the pure model sprang a leak. The whole premise is that a page is safe as long as its &lt;em&gt;epoch&lt;/em&gt; is live. But a caller can hold a raw &lt;code&gt;ref T&lt;/code&gt; into a page across a refresh — and a refresh deliberately ages that page out, marking it evictable while the pointer is still on the stack. The coarse epoch clock can't see a pointer that outlives the epoch that created it.&lt;/p&gt;

&lt;p&gt;So pure EBR wasn't quite enough, and I had to bring back a small, targeted counter: &lt;code&gt;SlotRefCount&lt;/code&gt;. It tracks the much narrower fact of "a live accessor slot is pointing at this page right now," and it gates eviction alongside the epoch check. The full guard is the conjunction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Evict only when nobody can be looking — by epoch OR by raw pointer&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DirtyCounter&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AccessEpoch&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MinActiveEpoch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SlotRefCount&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It would be tidier to say I deleted reference counting outright. The honest version is that I moved it: epochs handle the common case — thousands of pages, retired in one tick — and a tiny refcount covers the narrow case epochs structurally can't see, the raw pointer that escapes its epoch. The 2N tax is gone from the hot path; what remains is a counter that almost never moves.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧰 Two details worth stealing
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cache-line-pad the per-thread slots.&lt;/strong&gt; The registry's whole reason to exist is that threads pin and unpin without fighting each other. Pack two threads' epoch fields into one cache line and you've reintroduced exactly the false-sharing the design set out to kill. Each slot is its own cache line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// EpochThreadRegistry — one cache line per thread, no false sharing on pin/unpin&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;StructLayout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LayoutKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Explicit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;PaddedEpochSlot&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;FieldOffset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;PinnedEpoch&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// hot: written every enter/exit&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;FieldOffset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;  &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;  &lt;span class="n"&gt;Depth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;// hot: nesting depth&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;FieldOffset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;  &lt;span class="n"&gt;SlotState&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// warm: CAS on claim/free&lt;/span&gt;
    &lt;span class="c1"&gt;// bytes 16–63: padding to fill the line&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;A pinned epoch is a liability if its thread dies.&lt;/strong&gt; A thread that pins an epoch and then crashes or exits without unpinning would hold &lt;code&gt;MinActiveEpoch&lt;/code&gt; down forever — freezing eviction across the whole cache. Two safeguards catch it: each slot is owned by a &lt;code&gt;CriticalFinalizerObject&lt;/code&gt; that releases it when the thread is collected, and the &lt;code&gt;MinActiveEpoch&lt;/code&gt; scan does a &lt;code&gt;Thread.IsAlive&lt;/code&gt; liveness check, reclaiming any slot whose owner has died. The grace period is robust to threads that never come back.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎯 The takeaway
&lt;/h2&gt;

&lt;p&gt;Reference counting is the reflex answer to "don't free this while someone's using it," and for a lot of code it's the right one. But it answers a harder question than most callers ask. When you don't need &lt;em&gt;who&lt;/em&gt; or &lt;em&gt;how many&lt;/em&gt; — only &lt;em&gt;is it safe yet&lt;/em&gt; — epoch-based reclamation turns O(N) contended bookkeeping into O(1): publish a number on the way in, tick a number on the way out, and let a grace period retire everything at once. The cost is that protection is coarse, so it fits workloads whose units of work are bounded — which, for a database built for game ticks and simulation frames, they are by construction.&lt;/p&gt;

&lt;p&gt;That's the pattern worth keeping even if you never touch Typhon: before you reach for a counter, check whether you actually need the count, or just the guarantee.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⏭️ What's next
&lt;/h2&gt;

&lt;p&gt;Post #7: &lt;strong&gt;MVCC at Microsecond Scale: Snapshot Isolation Without Cloning Rows.&lt;/strong&gt; Those epoch-stamped pages are where revision chains live — every update appends a new version instead of overwriting, readers see a frozen snapshot, and old revisions get retired by the same kind of grace-period reasoning you just read. Next post pulls that thread.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Follow the &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; for source and benchmarks, or subscribe via &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>concurrency</category>
      <category>database</category>
    </item>
    <item>
      <title>A Database You Can See</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Mon, 08 Jun 2026 14:45:25 +0000</pubDate>
      <link>https://dev.to/nockawa/a-database-you-can-see-2n8g</link>
      <guid>https://dev.to/nockawa/a-database-you-can-see-2n8g</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: The Typhon Workbench&lt;/strong&gt; — the tools that make the engine usable&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A Database You Can See&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;You Can't Optimize What You Can't See &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Querying by Hand &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A lighter, hands-on companion to the engine deep-dive series, &lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;&lt;em&gt;A Database That Thinks Like a Game Engine&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; &amp;nbsp;•&amp;nbsp; 📬 &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;Subscribe via RSS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I spent long time making Typhon fast. Sub-microsecond commits, MVCC snapshot isolation, cache-line-aware storage — the kind of numbers that make a systems programmer lean in. Then I opened a &lt;code&gt;.typhon&lt;/code&gt; file to debug something, and realized I was staring at a black box. I had built an engine I couldn't &lt;em&gt;see into&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That's the quiet trap of infrastructure software: the better the engine, the more invisible it is. A database that does its job disappears — right up until the moment you need to know &lt;em&gt;what&lt;/em&gt; your schema actually looks like in memory, &lt;em&gt;which&lt;/em&gt; systems touch a component, or &lt;em&gt;whether&lt;/em&gt; a query does what you think. At that moment, raw speed is worth nothing without a way to look inside.&lt;/p&gt;

&lt;p&gt;So this post starts a new, lighter track in the series — about the &lt;strong&gt;Typhon Workbench&lt;/strong&gt;, the tool that makes the engine usable. The thesis is simple and, I think, under-appreciated:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🎯 Great technology is not enough. You need tools that let people make the most of it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, concretely: the Workbench is the tool you keep open to understand a Typhon system — its data, its schema, and its behavior — whether that system is a file on disk, a captured trace, or a live engine you're attached to. It's built for both &lt;strong&gt;developers&lt;/strong&gt; and &lt;strong&gt;ops&lt;/strong&gt;. Developers use it to understand structure and chase performance: how an archetype is laid out, what's actually stored on an entity, where the time goes. &lt;/p&gt;

&lt;p&gt;Ops and reliability engineers use it to watch a live system — tick rate, jitter, overload, queue depth — and to freeze the feed or capture a window when something spikes. Either way, it answers what code alone can't: &lt;em&gt;What does my schema really look like? What's in this archetype right now? Which systems touch this component? Why did that tick stall? Is the engine healthy?&lt;/em&gt; — the things you'd otherwise chase with &lt;code&gt;Console.WriteLine&lt;/code&gt;, log scraping, and a lot of guessing. This first post stays on the data-and-schema side; later in the track we reach the profiling and live views.&lt;/p&gt;

&lt;p&gt;A quick note on what it's built from, for the curious: the Workbench is a small full-stack app that runs entirely on your machine. The backend is &lt;strong&gt;ASP.NET Core&lt;/strong&gt; (Kestrel) speaking to the engine; the frontend is &lt;strong&gt;React 19 + TypeScript&lt;/strong&gt; built with &lt;strong&gt;Vite&lt;/strong&gt;, styled with &lt;strong&gt;Tailwind CSS&lt;/strong&gt; and &lt;strong&gt;shadcn/ui&lt;/strong&gt; (Radix primitives), using &lt;strong&gt;dockview&lt;/strong&gt; for the draggable panel layout, &lt;strong&gt;Zustand&lt;/strong&gt; and &lt;strong&gt;TanStack Query&lt;/strong&gt; for state, and &lt;strong&gt;cmdk&lt;/strong&gt; behind the command palette. Nothing leaves localhost.&lt;/p&gt;

&lt;h2&gt;
  
  
  See it first
&lt;/h2&gt;

&lt;p&gt;Before the words, here's the two-minute tour:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=yIFqwPJKOlA" rel="noopener noreferrer"&gt;▶ Watch "Typhon Workbench intro" on YouTube&lt;/a&gt; — if the embed doesn't load in your viewer&lt;/p&gt;

&lt;h2&gt;
  
  
  DataGrip meets a flight recorder
&lt;/h2&gt;

&lt;p&gt;The Workbench is a local developer tool. You point it at a Typhon database — a &lt;code&gt;.typhon&lt;/code&gt; file on disk — and it opens a window into everything inside: the schema, the data, and the way both sit in memory and on disk. It can also attach to a live engine or replay a captured trace, but this post is about the simplest, most common case: you have a database file, and you want to understand it.&lt;/p&gt;

&lt;p&gt;If you've used JetBrains DataGrip, DBeaver, or MongoDB Compass, the shape is familiar — a navigator down one side, an object inspector, a data grid, drill-downs. That's deliberate. I didn't want to invent a new mental model for "exploring a database"; the last thirty years of database tooling already converged on one that works, and the Workbench borrows it wholesale. What it adds is the part that's specific to Typhon: it speaks entities, archetypes, and components natively, and it can show you things a row-store tool never could — cache-line layouts, on-disk fragmentation, the cost of a query before you run it. (The other half of the name — the flight recorder, for profiling a running engine — is a story for a later post in this track. Here we stay with the database.)&lt;/p&gt;

&lt;p&gt;There's no install ceremony — open it, point it at a file, and it's out of your way in seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/wb1-overview.png" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fwb1-overview.png" alt="The Typhon Workbench with a database open — docked multi-panel layout" width="800" height="449"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From a file to a model you can walk
&lt;/h2&gt;

&lt;p&gt;Open a &lt;code&gt;.typhon&lt;/code&gt; file and you don't get a hex dump or a list of opaque page numbers. You get a model you can walk, top to bottom.&lt;/p&gt;

&lt;p&gt;It starts with the &lt;strong&gt;Schema Explorer&lt;/strong&gt; — a tree of your &lt;em&gt;archetypes&lt;/em&gt;. As the engine series describes, Typhon stores data the way a game engine does: an archetype is a set of entities that share the same component composition, which makes it the rough equivalent of a table. The Schema Explorer lists them all, with the numbers that matter at a glance — how many entities each holds, which components it carries, how full its storage is. It's fuzzy-searchable, so on a schema with a hundred archetypes you type three letters and you're there.&lt;/p&gt;

&lt;p&gt;From an archetype you drill into a &lt;strong&gt;Component&lt;/strong&gt;, and this is where it stops looking like a generic table browser and starts looking like something built for &lt;em&gt;this&lt;/em&gt; engine. A component isn't just a column; it has tabs for its fields, the archetypes that use it, the systems that read and write it, its storage mode. (One of those tabs — the memory layout — gets a whole post to itself next time. It earns it.)&lt;/p&gt;

&lt;p&gt;Three things from the video are worth calling out, because they're the difference between a tool you tolerate and one you live in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Density is a setting.&lt;/strong&gt; I stare at this thing all day, so it ships compact by default — more rows, less chrome — but you can loosen it when you'd rather have room to breathe.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dark and light both exist and both work.&lt;/strong&gt; Not a switch bolted on at the end; the whole UI is built on theme variables, so nothing breaks when you flip it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The command palette is the spine.&lt;/strong&gt; One shortcut gives you a single search box that reaches everything — every view, and every object in your database. Type a component's name and jump straight to it; prefix your search to scope it (one prefix runs actions, another finds an object in the current session, another jumps to a moment in a trace). Anything you can reach with the mouse, you can reach from the keyboard. That isn't a power-user garnish — it's how discoverability works. Someone who doesn't know where a feature lives can &lt;em&gt;find&lt;/em&gt; it by typing what they want.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/wb1-command-palette.png" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fwb1-command-palette.png" alt="The Workbench command palette open, showing prefix-scoped navigation to views and objects" width="800" height="105"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/wb1-schema-explorer.png" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fwb1-schema-explorer.png" alt="Schema Explorer showing the archetype tree with entity and component counts" width="465" height="422"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  One click, every view
&lt;/h2&gt;

&lt;p&gt;Here's the part that took the most work and is the easiest to miss: in the Workbench, &lt;strong&gt;selection is global.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Select a component — anywhere, in any panel — and the whole tool reorients around it. The inspector shows its details. The archetypes that use it light up. The systems that read it and the systems that write it appear together, so you can see the blast radius of a change before you make it: &lt;em&gt;if I touch this field, here's everything that cares.&lt;/em&gt; You made one gesture; several panels answered.&lt;/p&gt;

&lt;p&gt;That sounds obvious until you remember how it usually goes. In most toolchains every view is an island. You find an ID in one window, copy it, paste it into another, lose your place, open a third. The friction is so normal you stop noticing it — you just accept that "investigating" means juggling. The Workbench's bet is that you shouldn't have to: one selection, shared across every panel, reversible with a back button that always takes you home.&lt;/p&gt;

&lt;p&gt;The honest part: none of the individual panels were the hard problem. Most of them existed already. The hard problem was the wiring — a single shared notion of "what is selected" that every panel both listens to and can drive. That's the unglamorous work that turns a folder full of capable views into something that feels like one product. It's invisible when it works, which is exactly why it's worth pointing at.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/wb1-selection-bus.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fwb1-selection-bus.png" alt="One selection radiating to every related panel: inspector, used-in archetypes, and reader/writer systems all react to a single component selection" width="800" height="444"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/wb1-selection-linked.png" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fwb1-selection-linked.png" alt="Selecting a component lights up its linked panels — used-in archetypes and reader/writer systems" width="799" height="335"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real data, decoded
&lt;/h2&gt;

&lt;p&gt;Schema is half the story. The other half is &lt;em&gt;what's actually in there.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Data Browser&lt;/strong&gt; pages through the real entities in an archetype and shows their component values decoded — not raw bytes, not a blob you have to interpret, but &lt;code&gt;Position { X = 124.5, Y = 88.1 }&lt;/code&gt;, &lt;code&gt;Health = 98&lt;/code&gt;, the actual values your systems are reading. Pick the columns you care about, page through, click a single entity to inspect it on its own. And it's strictly read-only: looking never changes what's there, so you can explore a production capture without a second thought.&lt;/p&gt;

&lt;p&gt;It closes the loop the rest of the tool sets up. You walked the schema from archetype to component; now you see the data sitting in that shape. From a row you can select a component value and — because selection is global — bounce straight back to the schema side to ask "wait, how is this field actually laid out?" The investigation flows in both directions, which is the whole point.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/wb1-data-browser.png" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fwb1-data-browser.png" alt="The Data Browser showing real entity rows with decoded component values and a column picker" width="800" height="623"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson: tools need design too
&lt;/h2&gt;

&lt;p&gt;Let me end on the thing this whole post is really about.&lt;/p&gt;

&lt;p&gt;Typhon's engine solved a genuinely hard problem: ACID transactions at microsecond latency, on a data model borrowed from game engines, with none of the usual compromises. That took two years and most of my stubbornness. But an engine is a capability, not an experience. The moment a real developer sits down — to debug a schema, to understand why something is slow, to check what's actually in the database — that capability is only as good as their ability to &lt;em&gt;reach&lt;/em&gt; it.&lt;/p&gt;

&lt;p&gt;The Workbench is the other half of that work, and it needed its own kind of design discipline — none of it about raw performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One action, not three.&lt;/strong&gt; Any move you'd want to make is a single gesture from where you already are.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No dead buttons.&lt;/strong&gt; A control that renders is a control that works. Nothing that looks clickable does nothing — broken affordances erode trust faster than missing features ever do.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selection is global and reversible.&lt;/strong&gt; One click drives everything; the back button always brings you home.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed without orientation is just a faster way to get lost.&lt;/strong&gt; Fast &lt;em&gt;and&lt;/em&gt; oriented, or it doesn't count.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the under-appreciated half of building good technology: the tools aren't a cherry on top, they're the path the value travels to reach a human being. You can have the fastest engine in the world, and if nobody can see inside it, you've built a very elegant black box. I'd rather build one you can see.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Next in the Workbench track: &lt;strong&gt;You Can't Optimize What You Can't See&lt;/strong&gt; — where the abstract gets physical. The component layout grid shows you cache lines, field padding, and alignment as something you can actually look at; the File Map draws your entire database on disk, fragmentation and all. Meanwhile the engine track continues its deep dives.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Follow the &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; for source and benchmarks, or subscribe via &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>database</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Three Durability Modes, One WAL: Configurable Guarantees for Different Workloads</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Wed, 20 May 2026 11:21:06 +0000</pubDate>
      <link>https://dev.to/nockawa/three-durability-modes-one-wal-configurable-guarantees-for-different-workloads-21pd</link>
      <guid>https://dev.to/nockawa/three-durability-modes-one-wal-configurable-guarantees-for-different-workloads-21pd</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/what-game-engines-know-about-data/" rel="noopener noreferrer"&gt;What Game Engines Know About Data That Databases Forgot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/microsecond-latency-managed-language/" rel="noopener noreferrer"&gt;Microsecond Latency in a Managed Language&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/deadlock-free-by-construction/" rel="noopener noreferrer"&gt;Deadlock-Free by Construction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three Durability Modes, One WAL&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;MVCC at Microsecond Scale &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; &amp;nbsp;•&amp;nbsp; 📬 &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;Subscribe via RSS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Most databases pick one durability strategy at boot time. Typhon picks one &lt;strong&gt;per commit&lt;/strong&gt; — and the surprising part isn't the user-facing API, it's that all three modes share the same WAL (Write-Ahead Log — every commit appends a record here before being durable) writer thread, the same ring buffer, the same I/O path. The only thing that differs is whether the producer waits.&lt;/p&gt;

&lt;p&gt;Ten thousand NPC position updates per simulation tick at ~1-2µs each, all &lt;code&gt;Deferred&lt;/code&gt;. One legendary item drop on the same transaction code path, escalated to &lt;code&gt;Immediate&lt;/code&gt;, paying ~15-85µs for a guaranteed FUA (Force Unit Access — don't ack the write until it's on stable media) write to disk. Same engine, same WAL, one extra argument at &lt;code&gt;Commit()&lt;/code&gt;. This post is about why that's possible and what it cost to keep it that way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The other classical knob
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/blog/deadlock-free-by-construction/" rel="noopener noreferrer"&gt;Post #4&lt;/a&gt; covered Typhon's first big architectural bet — eliminating deadlocks at the design level instead of detecting them at runtime. This one is about the &lt;strong&gt;other&lt;/strong&gt; classical database knob: durability. And like deadlocks, the interesting decision is upstream of the implementation.&lt;/p&gt;

&lt;p&gt;Three workloads sit on the same engine in a real game server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simulation tick:&lt;/strong&gt; physics, AI, position updates. Hundreds to thousands of writes per frame. Losing the last tick on crash is fine; the simulation will recompute it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Player actions:&lt;/strong&gt; combat events, item pickups, dialogue state. Sub-5ms data loss window is acceptable. Throughput matters more than per-commit durability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Money and rare events:&lt;/strong&gt; currency transfers, legendary drops, account creation. &lt;strong&gt;Zero tolerance for loss.&lt;/strong&gt; Players will dispute every missing transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single global durability setting forces all three to the most conservative option — and ~15-85µs per FUA write turns a 60Hz tick budget (16ms) into a fight you've already lost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three modes
&lt;/h2&gt;

&lt;p&gt;The decision lives on the &lt;a href="https://github.com/nockawa/Typhon/blob/main/src/Typhon.Engine/Transactions/public/UnitOfWork.cs" rel="noopener noreferrer"&gt;Unit of Work (UoW)&lt;/a&gt; at creation time, with a per-transaction override for escalation only. A UoW sits one level above a transaction: it groups one or more transactions and owns the durability contract they share. Transactions still commit atomic state changes; the UoW decides when — and whether — those commits reach disk. The user-facing enum is exactly what it looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;DurabilityMode&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;WAL records buffered. Durable only after explicit Flush()/FlushAsync().&lt;/span&gt;
    &lt;span class="c1"&gt;/// Commit latency: ~1-2µs. Data-at-risk: until Flush().&amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;Deferred&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;WAL writer auto-flushes every N ms (default 5ms).&lt;/span&gt;
    &lt;span class="c1"&gt;/// Commit latency: ~1-2µs. Data-at-risk: ≤ GroupCommitInterval.&amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;GroupCommit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;FUA on every tx.Commit(). Blocks until WAL record is on stable media.&lt;/span&gt;
    &lt;span class="c1"&gt;/// Commit latency: ~15-85µs. Data-at-risk: zero.&amp;lt;/summary&amp;gt;&lt;/span&gt;
    &lt;span class="n"&gt;Immediate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;DurabilityOverride&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Default&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Use the UoW's DurabilityMode&lt;/span&gt;
    &lt;span class="n"&gt;Immediate&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Force FUA for this specific commit (escalation only)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The override can only escalate. A &lt;code&gt;Deferred&lt;/code&gt; UoW can promote one transaction to &lt;code&gt;Immediate&lt;/code&gt;; an &lt;code&gt;Immediate&lt;/code&gt; UoW cannot weaken anything. This is a deliberate constraint, not an oversight — it makes data-loss bugs impossible by API shape. You can never accidentally make a transaction &lt;em&gt;less&lt;/em&gt; durable than the UoW's contract.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Commit latency&lt;/th&gt;
&lt;th&gt;Data-at-risk window&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Deferred&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~1-2µs&lt;/td&gt;
&lt;td&gt;Until explicit &lt;code&gt;Flush()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Game ticks, batch imports, simulation steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;GroupCommit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~1-2µs amortized&lt;/td&gt;
&lt;td&gt;≤ 5ms (configurable)&lt;/td&gt;
&lt;td&gt;General server load, request handlers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Immediate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~15-85µs&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;td&gt;Trades, account writes, legendary drops&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  One writer thread, three signaling patterns
&lt;/h2&gt;

&lt;p&gt;Here's the part that surprised me when I came back to the design six months in: I did not need three I/O paths. Or three threads. Or three buffers. The shared infrastructure looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/three-durability-modes.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Fthree-durability-modes.png" alt="Three durability modes converging on one WAL Writer — Deferred, GroupCommit, and Immediate producers all publish to the same MPSC ring buffer; only Immediate also signals the writer and waits for DurableLsn to advance past its LSN; the GroupCommit timer wakes the writer on a 5ms ceiling; one writer thread, one segment file, one FUA write per drained batch" width="800" height="812"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The same picture broken into phases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Producer thread (&lt;code&gt;tx.Commit()&lt;/code&gt;)&lt;/th&gt;
&lt;th&gt;What this means for you&lt;/th&gt;
&lt;th&gt;WAL Writer thread (single, dedicated)&lt;/th&gt;
&lt;th&gt;What this means for you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;TryClaim()&lt;/code&gt; — CAS (Compare-And-Swap) slot allocation&lt;/td&gt;
&lt;td&gt;Your commit atomically reserves a slot in the WAL ring buffer. No lock contention with other transactions claiming slots in parallel.&lt;/td&gt;
&lt;td&gt;&lt;em&gt;(idle, or finishing a previous drain)&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;The writer thread runs independently. Your producer never waits on it to claim a slot.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;Publish()&lt;/code&gt; — release-store the frame header&lt;/td&gt;
&lt;td&gt;The record is now visible to the writer. Your commit has an LSN (Log Sequence Number) — its position in the durability timeline.&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;TryDrain()&lt;/code&gt; — contiguous batch of published frames&lt;/td&gt;
&lt;td&gt;The writer harvests every published slot in one pass. This is the structural reason &lt;code&gt;GroupCommit&lt;/code&gt; is amortized: N producers, one drain, one FUA cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;
&lt;em&gt;Mode-specific:&lt;/em&gt; return now, or &lt;code&gt;WaitForDurable(lsn)&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Deferred / GroupCommit: &lt;code&gt;Commit()&lt;/code&gt; returns in ~1-2µs and durability lands asynchronously. Immediate: &lt;code&gt;Commit()&lt;/code&gt; returns only once your LSN is on disk (~15-85µs).&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;WriteAligned()&lt;/code&gt; → FUA write → &lt;code&gt;Interlocked.Exchange(DurableLsn)&lt;/code&gt; → &lt;code&gt;_durabilityEvent.Set()&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;One physical FUA write per batch (~15-85µs, paid once). &lt;code&gt;DurableLsn&lt;/code&gt; advances; any Immediate waiter whose LSN ≤ &lt;code&gt;DurableLsn&lt;/code&gt; wakes and returns from &lt;code&gt;tx.Commit()&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What changes between the modes is &lt;strong&gt;only the producer side&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deferred&lt;/strong&gt; — Publish and return. No signal, no wait. The writer may be asleep; it will wake on the next group-commit timer or explicit &lt;code&gt;Flush()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GroupCommit&lt;/strong&gt; — Publish and return. The writer is already in a &lt;code&gt;WaitForData(GroupCommitIntervalMs)&lt;/code&gt; loop; the next tick of that timer (≤5ms) drains the batch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immediate&lt;/strong&gt; — Publish, signal the writer, then &lt;code&gt;WaitForDurable(lsn)&lt;/code&gt; until &lt;code&gt;DurableLsn&lt;/code&gt; advances past the producer's LSN.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is where the elegance lives. It's not a separate code path — it's the same path, with one extra fast-path check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;WaitForDurable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;WaitContext&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Fast path: already durable, returns inline.&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Interlocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;_durableLsn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;lsn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nf"&gt;WaitForDurableSlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the WAL writer has already drained past your LSN by the time you call this — say, because someone else's &lt;code&gt;Immediate&lt;/code&gt; commit just batched yours along with it — you pay one atomic read and a return. No event wait, no syscall, no context switch. &lt;strong&gt;Immediate mode is GroupCommit's batching benefit, available to the one transaction that needs it now.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the teaching moment I want this post to leave you with: per-transaction durability is not three implementations. It's one implementation with three producer-side policies, and the FUA cost is a property of the I/O path, not the API surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Per-UoW, not per-engine — why
&lt;/h2&gt;

&lt;p&gt;The API-shape decision is recorded in its own ADR (&lt;a href="https://en.wikipedia.org/wiki/Architectural_decision" rel="noopener noreferrer"&gt;Architecture Decision Record&lt;/a&gt;). I considered four alternatives before landing on per-UoW:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Alternative&lt;/th&gt;
&lt;th&gt;Why I rejected it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Per-database (boot-time)&lt;/td&gt;
&lt;td&gt;Too coarse. Game ticks and trades on the same DB need different modes within the same process.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-transaction&lt;/td&gt;
&lt;td&gt;Can't batch — &lt;code&gt;GroupCommit&lt;/code&gt; is &lt;em&gt;inherently&lt;/em&gt; multi-transaction. The UoW is the natural batching boundary.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two modes (Sync / Async)&lt;/td&gt;
&lt;td&gt;Misses &lt;code&gt;GroupCommit&lt;/code&gt;'s sweet spot. The whole point is amortized FUA, which a binary doesn't capture.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caller-managed flush only&lt;/td&gt;
&lt;td&gt;Error-prone. Developers forget to flush. &lt;code&gt;GroupCommit&lt;/code&gt; automates the common case correctly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The honest version: I tried "per-database" first because it was easiest to wire up, and immediately hit the simulation-vs-trade problem on the first benchmark. Two production game-server workloads on the same engine, one wanting ~1µs commits and one wanting zero data loss. The mode has to follow the workload, not the storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers that matter
&lt;/h2&gt;

&lt;p&gt;The latency table above is the headline. The throughput table is where it gets interesting:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Single-thread durable tx/s&lt;/th&gt;
&lt;th&gt;Multi-thread durable tx/s&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Deferred&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;N/A (batch-durable)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;GroupCommit&lt;/code&gt; (5ms interval)&lt;/td&gt;
&lt;td&gt;~200K+ amortized&lt;/td&gt;
&lt;td&gt;Millions (shared flush)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Immediate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~12K-65K&lt;/td&gt;
&lt;td&gt;~12K-65K (FUA-limited)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Single-thread &lt;code&gt;Immediate&lt;/code&gt; is capped by NVMe FUA round-trip — there's no software trick to escape that. But multi-thread &lt;code&gt;Immediate&lt;/code&gt; does &lt;strong&gt;not&lt;/strong&gt; scale linearly past one thread, because every commit is racing the same writer through the same I/O. &lt;code&gt;GroupCommit&lt;/code&gt;, on the other hand, scales nearly with thread count because the FUA cost is paid once per drain cycle no matter how many producers contributed to the batch.&lt;/p&gt;

&lt;p&gt;That's not a flaw in &lt;code&gt;Immediate&lt;/code&gt;. It's the physics of the storage device. The point of having three modes is that you only pay that cost where you actually need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I got wrong
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The first group-commit timer was 1ms.&lt;/strong&gt; Under low write load the WAL was doing constant small FUA writes — worst-case for SSD wear and tail latency. Tuning to 5ms with a "wake on N records OR T milliseconds" trigger fixed it: the writer sleeps on &lt;code&gt;WaitForData(intervalMs)&lt;/code&gt; and gets pulled out early when a producer signals (Immediate commits, explicit &lt;code&gt;Flush()&lt;/code&gt;, or back-pressure). Idle periods cost nothing; busy periods batch naturally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first override design allowed downgrade.&lt;/strong&gt; &lt;code&gt;tx.Commit(DurabilityOverride.Deferred)&lt;/code&gt; from an &lt;code&gt;Immediate&lt;/code&gt; UoW. The use case was "this single read-mostly transaction doesn't really need FUA." The use case was wrong: the UoW's contract is the durability &lt;em&gt;floor&lt;/em&gt;, not the default. Downgrading a single commit means the application has accidentally created a hole in a contract it thinks it has. Now overrides can only escalate, and the type system enforces it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deferred mode is a contract, not a latency number.&lt;/strong&gt; Early users assumed &lt;code&gt;Commit()&lt;/code&gt; always meant "on disk." It doesn't. Deferred mode says: &lt;em&gt;your data is not durable until you call &lt;code&gt;Flush()&lt;/code&gt; or close cleanly&lt;/em&gt;. For game servers that's fine; their tick loop already has clear boundaries. But the documentation now leads with the contract, not the µs number. The latency is a consequence; the contract is what you signed up for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Post #6 in the series: &lt;strong&gt;Building a Page Cache That Doesn't Count: Epoch-Based Memory Management&lt;/strong&gt;. The durability story above assumes the commit path is fast, but that's only half the story — the &lt;em&gt;read&lt;/em&gt; path has to be just as cheap, and the trick there is replacing per-page reference counting with epoch-based protection: two atomic operations per transaction instead of two per page. The mechanism is elegant enough that it deserves its own post.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Follow the &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; for source and benchmarks, or subscribe via &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>database</category>
      <category>durability</category>
    </item>
    <item>
      <title>Deadlock-Free by Construction: How Typhon Eliminates Deadlocks Instead of Detecting Them</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Mon, 27 Apr 2026 17:48:24 +0000</pubDate>
      <link>https://dev.to/nockawa/deadlock-free-by-construction-how-typhon-eliminates-deadlocks-instead-of-detecting-them-4057</link>
      <guid>https://dev.to/nockawa/deadlock-free-by-construction-how-typhon-eliminates-deadlocks-instead-of-detecting-them-4057</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/what-game-engines-know-about-data/" rel="noopener noreferrer"&gt;What Game Engines Know About Data That Databases Forgot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/microsecond-latency-managed-language/" rel="noopener noreferrer"&gt;Microsecond Latency in a Managed Language&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deadlock-Free by Construction&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;MVCC at Microsecond Scale &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; &amp;nbsp;•&amp;nbsp; 📬 &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;Subscribe via RSS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Deadlocks are usually treated as a runtime problem. We treat them as a design bug.&lt;/p&gt;

&lt;p&gt;That sounds like a slogan. It isn't. It's the actual reasoning behind three architectural decisions that, taken together, make a lock-dependency cycle impossible in Typhon — not unlikely, not rare, &lt;em&gt;impossible&lt;/em&gt;. The engine ships without a deadlock detector. From the project's own concurrency overview:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Deadlock detection is explicitly not implemented — it would add overhead for a scenario that cannot occur in the current architecture.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post is the &lt;em&gt;how&lt;/em&gt;: how three structural decisions remove three classes of edges from the lock-dependency graph, and how that elimination cascades into "no cycle is possible." But it's also the &lt;em&gt;why&lt;/em&gt;: why the constraint was set at project inception, before any code existed to deadlock, and what it cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The upfront bet
&lt;/h2&gt;

&lt;p&gt;I didn't compare deadlock detection schemes before starting Typhon. I'd seen them in production at previous engines, and the pattern was always the same: a separate background scanner, a wait-for graph, victim selection heuristics, transaction abort and retry. A lot of code. A lot of edge cases. None of it bulletproof for the user, who still sees occasional one-second pauses or unexplained transaction failures under load.&lt;/p&gt;

&lt;p&gt;So I made an upfront call, recorded as &lt;strong&gt;ADR-003&lt;/strong&gt;, the project's first concurrency decision, dated &lt;em&gt;2024-01 (project inception)&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Optimistic locking: No locks during execution; conflict detection only at commit.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;(ADRs — Architecture Decision Records — are short documents capturing one design choice with its context and rationale. They're a paper trail for &lt;em&gt;why&lt;/em&gt; a thing was built a particular way, not just &lt;em&gt;what&lt;/em&gt; it does. Typhon has 49 of them so far. They live in the project's internal documentation, not in the public repo.)&lt;/p&gt;

&lt;p&gt;That's the bet. &lt;strong&gt;No locks across data, whatever the architectural cost.&lt;/strong&gt; Not because I had proof prevention would be faster — I didn't run those benchmarks — but because the implementation cost of detection is real, the result is never bulletproof, and trading an architectural cost up front for never paying a runtime cost later is the trade I wanted.&lt;/p&gt;

&lt;p&gt;The three "pillars" that follow aren't a survey of alternatives I considered. They're what the architecture had to become once the constraint was set. MVCC was the obvious starting point. Optimistic Lock Coupling for indexes followed because traditional B+Tree latch coupling violated the constraint at the index level. The "no cross-table latching" rule emerged because anything else reintroduced the cycles I was trying to eliminate.&lt;/p&gt;

&lt;p&gt;It's constraint-driven design, not survey-driven. And it's why this post claims a property — deadlock-free by construction — instead of a benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a deadlock actually is
&lt;/h2&gt;

&lt;p&gt;Briefly, because the rest of the post needs the picture.&lt;/p&gt;

&lt;p&gt;Two transactions, T1 and T2. T1 holds a lock on row A and asks for a lock on row B. T2 holds B and asks for A. Neither can proceed. Each is waiting for the other; the wait will never end. That's a cycle in the &lt;strong&gt;lock-dependency graph&lt;/strong&gt; — the directed graph whose nodes are transactions and whose edges are "is waiting for." A deadlock is a cycle in that graph. Detection-based databases scan for cycles and break them by aborting one transaction. Prevention-based databases make cycles impossible to form.&lt;/p&gt;

&lt;p&gt;The three sections that follow each remove one class of edges from that graph. With every class removed, no cycle is possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 1: MVCC eliminates inter-transaction data locks
&lt;/h2&gt;

&lt;p&gt;The textbook deadlock — T1 locks row A, T2 locks row B, both want the other — requires &lt;em&gt;row-level locking between transactions&lt;/em&gt;. Typhon doesn't do that.&lt;/p&gt;

&lt;p&gt;Reads are &lt;strong&gt;snapshot-consistent&lt;/strong&gt;: every transaction is frozen at the global tick value when it began. A reader sees a stable view of the database for its entire lifetime. It never asks for a lock, because there's nothing to lock against — the snapshot is already immutable.&lt;/p&gt;

&lt;p&gt;Writes don't lock existing rows either. They create &lt;strong&gt;new revisions&lt;/strong&gt;, with the previous revision left intact for any transaction whose snapshot still references it. Two writers updating the same component don't fight over a lock; they each append a new revision to the chain. Conflict detection happens at &lt;em&gt;commit time&lt;/em&gt;, as a single CAS operation: when the writer tries to install its new revision as the current one, the engine checks that the version it built on is still current. If not, the writer aborts and retries.&lt;/p&gt;

&lt;p&gt;This removes the entire edge class of "data locks held across transactions." There are no row locks, no read locks, no write locks on data. The wait-for graph at the transaction level has no edges to form a cycle from.&lt;/p&gt;

&lt;p&gt;The cost isn't free. Two writers updating the same component will conflict at commit, and one of them will retry. For game-server workloads where most components are written by exactly one system, conflicts are rare. For general OLTP workloads with high write contention, the cost would shift the trade — fewer deadlocks, more aborts. Different curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 2: Optimistic Lock Coupling for index structures
&lt;/h2&gt;

&lt;p&gt;Even without row locks, an index structure (B+Tree, R-Tree) is shared mutable state. Traditional databases serialize access through &lt;strong&gt;latch coupling&lt;/strong&gt;: a reader holds a latch on the parent node while acquiring one on the child, releases the parent, advances. It's a chain of overlapping latches walking down the tree.&lt;/p&gt;

&lt;p&gt;That pattern can deadlock. Reader R has the parent latched and wants the child; concurrent writer W has the child latched and walks back up to fix the parent. Two threads, two index latches, mutual wait.&lt;/p&gt;

&lt;p&gt;Typhon uses &lt;strong&gt;Optimistic Lock Coupling&lt;/strong&gt; (&lt;a href="http://sites.computer.org/debull/A19mar/p73.pdf" rel="noopener noreferrer"&gt;Leis et al., 2019&lt;/a&gt;) instead. Readers don't latch at all. Each B+Tree node carries a 32-bit version counter. The reader reads the version, traverses, then re-reads the version at the end — if it changed, the traversal data may have been mutated mid-flight, so the reader restarts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// From Typhon.Engine/Data/Index/OlcLatch.cs&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;ReadVersion&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt; &lt;span class="m"&gt;0b11&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// locked (bit 0) or obsolete (bit 1) -&amp;gt; restart&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="nf"&gt;TryWriteLock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt; &lt;span class="m"&gt;0b1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Interlocked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CompareExchange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;_version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="m"&gt;0b1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;WriteUnlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt; &lt;span class="m"&gt;0b10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// version++, keep obsolete, clear lock&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bit 0 is the write-lock flag; bits 2–31 are a monotonic version counter. &lt;code&gt;ReadVersion&lt;/code&gt; returns 0 if the node is locked or obsolete — the caller treats that as "restart." &lt;code&gt;TryWriteLock&lt;/code&gt; is a single CAS. &lt;code&gt;WriteUnlock&lt;/code&gt; increments the version atomically with releasing the lock.&lt;/p&gt;

&lt;p&gt;Writers latch only the modified nodes, and they acquire from root to leaf, in strict order. No reader ever blocks a writer. No writer ever holds a parent latch while waiting on a child. The same pattern is reused by the spatial R-Tree — same &lt;code&gt;OlcLatch&lt;/code&gt;, same protocol — so this single mechanism covers both index families.&lt;/p&gt;

&lt;p&gt;This removes the edge class of "index-level latch cycles."&lt;/p&gt;

&lt;h2&gt;
  
  
  Pillar 3: No cross-table latch holding
&lt;/h2&gt;

&lt;p&gt;Two edge classes are gone. The third is the most boring and the most important: &lt;strong&gt;at any given moment, a thread never holds a latch in more than one table.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each &lt;code&gt;ComponentTable&lt;/code&gt; in Typhon has independent indexes, independent revision chains, independent page allocations. A transaction's commit path processes one table at a time. When the commit moves from table A to table B, all of A's latches are released first.&lt;/p&gt;

&lt;p&gt;The only resource that &lt;em&gt;would&lt;/em&gt; be shared across tables is the &lt;strong&gt;page cache&lt;/strong&gt;. Latches there could form cycles across the entire engine. So the page cache doesn't use latches. That refactor is recorded as &lt;strong&gt;ADR-033&lt;/strong&gt;, dated &lt;em&gt;2026-02-12&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Replace per-page reference counting with epoch-based protection. Each transaction enters an epoch scope that pins the current global epoch; pages accessed within the scope are stamped with that epoch; eviction defers any page whose epoch is still active.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The previous approach was reference counting: every page access incremented a counter, every release decremented it. A transaction touching 100 pages paid for 200 atomic operations — and atomics aren't free, each one stalls the CPU pipeline waiting for cache-line coherence. Epochs collapse that into two operations regardless of how many pages the transaction touches: one to enter the scope, one to exit.&lt;/p&gt;

&lt;p&gt;But the deadlock-freedom payoff isn't the cost reduction. It's that the page cache never holds a lock anyone else could wait on. No latch, no waiter queue, no edge in the lock-dependency graph at all.&lt;/p&gt;

&lt;p&gt;This removes the last edge class — cross-structure cycles. With all three classes gone, there is no graph in which a cycle can form.&lt;/p&gt;

&lt;p&gt;This pillar is the one I worry about most, and the one most likely to break in the future. It's enforced by &lt;strong&gt;convention&lt;/strong&gt;, not by the type system. Future features — cross-table indexes, parallel query execution holding read latches across multiple tables, foreign-key constraints — would each require extending the lock-hierarchy discipline. The concurrency overview explicitly lists those scenarios as known risks. I'll have to introduce explicit lock ordering when I get there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the bet costs
&lt;/h2&gt;

&lt;p&gt;Prevention isn't free; it just shifts the cost.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What's eliminated&lt;/th&gt;
&lt;th&gt;What remains&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deadlocks (cycles in the lock graph)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Aborts&lt;/strong&gt; at commit — local retry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detection runtime overhead&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;OLC restarts&lt;/strong&gt; under index contention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait-for-graph data structures&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Livelock&lt;/strong&gt; under heavy contention (different problem)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A writer that loses the commit-time CAS doesn't trigger a global abort — it retries from where it was, against a refreshed baseline. An OLC reader that sees a version change doesn't block a writer — it restarts the traversal. These are &lt;em&gt;local&lt;/em&gt; costs. A Postgres deadlock victim aborts the entire transaction; a Typhon OLC restart is one tree traversal.&lt;/p&gt;

&lt;p&gt;Livelock — repeated retries that never converge — is a different beast. It can't deadlock, but it can starve. Typhon's &lt;code&gt;AdaptiveWaiter&lt;/code&gt; handles this with a spin-then-yield progression: 65,536 tight spin iterations first (most contention resolves there), then exponentially halving spin counts interleaved with &lt;code&gt;Thread.Sleep(100µs)&lt;/code&gt;. The 100µs sleep is below the OS scheduler quantum, so wake latency stays sub-millisecond. It bounds livelock probability without trading away the latency targets.&lt;/p&gt;

&lt;p&gt;So: deadlocks gone, aborts and restarts kept, livelock bounded by a spin policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What others do
&lt;/h2&gt;

&lt;p&gt;I didn't survey these in depth before committing to prevention — the upfront "no locks" constraint was made on principle. But for the reader's context, here's the landscape Typhon sidesteps.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Cost model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wait-for graph, triggered after &lt;code&gt;deadlock_timeout&lt;/code&gt; (1s default)&lt;/td&gt;
&lt;td&gt;Detection deferred to ≥1s lock wait; cycle scan is expensive but rare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MySQL InnoDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wait-for graph + victim selection (smallest tx by row modifications wins)&lt;/td&gt;
&lt;td&gt;Detection can be disabled on high-concurrency systems in favor of &lt;code&gt;innodb_lock_wait_timeout&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CockroachDB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-node in-memory lock tables + Raft-replicated write intents&lt;/td&gt;
&lt;td&gt;Detection is near-instantaneous; cost shifted to lock-table maintenance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typhon&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Prevention by structure (three pillars above)&lt;/td&gt;
&lt;td&gt;No detection runtime cost; cost shifted to OLC restarts and commit-time aborts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are all sound engineering choices for their workloads. Postgres' deferred detection is rare-event optimization. InnoDB's "smaller transaction wins" is a pragmatic heuristic for the OLTP shapes it's tuned for. CockroachDB's instantaneous detection genuinely solves the latency problem detection has elsewhere. None of these are wrong. They're answering a different question: &lt;em&gt;given that we accept locks, how do we manage their cycles?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Typhon answers a different question: &lt;em&gt;given that we don't accept locks, what does the rest of the architecture have to look like?&lt;/em&gt; That's why the comparison isn't "Typhon is faster" — it's "Typhon paid the cost in a different layer." Each row above describes &lt;em&gt;where the cost lives&lt;/em&gt;, not who's faster.&lt;/p&gt;

&lt;p&gt;A footnote: TigerBeetle reaches the same end via a different upfront constraint — single-writer serializable execution. No concurrent transactions, no deadlocks. Different category, same conclusion: detection is the wrong layer to solve this.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd flag for a reviewer
&lt;/h2&gt;

&lt;p&gt;Three honest acknowledgments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pillar 3 is enforced by convention, not by the type system.&lt;/strong&gt; The compiler won't catch a future PR that holds latches across two &lt;code&gt;ComponentTable&lt;/code&gt;s. The discipline lives in code review and architectural awareness, not in mechanically-checked invariants. To compensate, I've set up a list of explicit design rules that Claude Code enforces during design, development, and code review. Pillar 3's "no cross-table latching" invariant is on that list; any code that would violate it gets flagged before it reaches the diff.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OLC restart cost is bounded but not zero.&lt;/strong&gt; Under heavy write contention on a hot B+Tree leaf, optimistic readers can restart a few times before getting a clean version. The restart is one traversal, not a transaction abort, but it's not free either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "deadlock-free" claim assumes the current feature set.&lt;/strong&gt; Cross-table indexes, parallel queries holding read latches across tables, and foreign-key constraints are all listed as future scenarios that would require extending the discipline. The structural argument holds for what ships today; future features will need to maintain it consciously.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The next post drills into Pillar 1 — how Typhon's MVCC works without cloning rows. The big trick is &lt;strong&gt;per-component revision chains&lt;/strong&gt; instead of per-row tuple versioning: an entity with eight components that updates one creates a single new revision, not eight. The visibility check is a single comparison against the transaction's snapshot tick. And the EnabledBits exception dictionary pattern — zero-overhead fast path, dictionary slow path — is the prettiest piece of code in the engine.&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>concurrency</category>
      <category>database</category>
    </item>
    <item>
      <title>Microsecond Latency in a Managed Language: The Performance Philosophy Behind Typhon</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Sun, 12 Apr 2026 20:58:24 +0000</pubDate>
      <link>https://dev.to/nockawa/microsecond-latency-in-a-managed-language-the-performance-philosophy-behind-typhon-1ob8</link>
      <guid>https://dev.to/nockawa/microsecond-latency-in-a-managed-language-the-performance-philosophy-behind-typhon-1ob8</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/what-game-engines-know-about-data/" rel="noopener noreferrer"&gt;What Game Engines Know About Data That Databases Forgot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsecond Latency in a Managed Language&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Deadlock-Free by Construction &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; &amp;nbsp;•&amp;nbsp; 📬 &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;Subscribe via RSS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first two posts in this series covered the &lt;em&gt;why&lt;/em&gt; and the &lt;em&gt;what&lt;/em&gt;. Why C# for a database engine. What happens when you combine ECS storage with database guarantees.&lt;/p&gt;

&lt;p&gt;This post is the &lt;em&gt;how&lt;/em&gt;. Specifically: the five design principles that guide every performance decision in Typhon. Not a bag of tricks — a philosophy. Individual optimizations come and go as the engine evolves, but these principles are stable. They're what let a managed language deliver sub-microsecond transaction latency.&lt;/p&gt;

&lt;p&gt;When your tick budget is 16 milliseconds and you have 100,000 entities to process, every nanosecond of per-entity cost matters. And most of that cost comes from decisions made at design time, not runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 1: Control Memory Layout
&lt;/h2&gt;

&lt;p&gt;Performance starts at the struct definition, not the algorithm. If your data layout causes cache misses, no algorithm can save you.&lt;/p&gt;

&lt;p&gt;The most dramatic example: Typhon recently moved from per-entity hash-table lookups to cluster-based Structure of Arrays (SoA) storage. Same data, same queries, different memory layout. Measured on a Ryzen 9 7950X:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;ns / entity&lt;/th&gt;
&lt;th&gt;vs baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard EntityAccessor&lt;/td&gt;
&lt;td&gt;139 ns&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ArchetypeAccessor (cached)&lt;/td&gt;
&lt;td&gt;94 ns&lt;/td&gt;
&lt;td&gt;1.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cluster iteration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;55x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a 55x improvement from changing memory layout alone. The reason: clusters pack N entities (8 to 64, auto-computed per archetype) in contiguous SoA memory. All positions together, all health values together. Every cache line the CPU loads is 100% useful data. For 100K entities, the working set dropped from scattered L3/DRAM access to ~2.5 MB that fits entirely in L2 cache — and L2 is 3x faster than L3 on Zen 4.&lt;/p&gt;

&lt;p&gt;The cluster size isn't a magic constant. An auto-tuning algorithm evaluates every N from 8 to 64 and picks the one that maximizes entities per 8 KB page for a given archetype's component schema. Non-power-of-2 sizes often pack better: N=14 can yield 28 entities per page vs N=16 yielding only 16. The capacity is derived from the data, not from convention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False sharing&lt;/strong&gt; is the other side of layout control. When multiple threads write to adjacent fields, the CPU bounces the shared cache line between cores — a 40-60 cycle penalty per bounce. Typhon wraps mutable per-thread state in 64-byte padded structs. The WAL commit buffer goes further: explicit padding fields isolating the producer's &lt;code&gt;_tailPosition&lt;/code&gt; and the consumer's &lt;code&gt;_drainPosition&lt;/code&gt; onto separate cache lines. Seven unused &lt;code&gt;long&lt;/code&gt; fields between them, suppressed with &lt;code&gt;#pragma warning&lt;/code&gt;, because the correct layout matters more than the linter's opinion.&lt;/p&gt;

&lt;p&gt;The same hardware awareness drives B+Tree node sizing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;StructLayout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LayoutKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Pack&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Index32Chunk&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 256 bytes — fills four cache lines. Adjacent Line Prefetcher (ALP) on&lt;/span&gt;
    &lt;span class="c1"&gt;// Zen 4+/recent Intel automatically fetches paired 64-byte lines within&lt;/span&gt;
    &lt;span class="c1"&gt;// 128-byte regions, so two ALP triggers cover the full node.&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;29&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Control&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;OlcVersion&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// bit 0 = locked, bit 1 = obsolete, bits 2-31 = version&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;PrevChunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;NextChunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;LeftValue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;HighKey&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// B-link upper bound&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Capacity&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;  &lt;span class="c1"&gt;// 29 × 4 = 116 bytes&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Capacity&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;    &lt;span class="c1"&gt;// 29 × 4 = 116 bytes&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This struct is exactly 256 bytes because of the CPU's prefetcher. The Adjacent Line Prefetcher on modern x86 fetches paired 64-byte lines within 128-byte aligned regions — so two ALP triggers cover the full node. A 256-byte node costs effectively the same as a 128-byte node in terms of memory access, but holds nearly twice the keys.&lt;/p&gt;

&lt;p&gt;The capacity of 29 keys isn't a round number because it isn't derived from the algorithm. It's derived from the hardware: 256 bytes of budget minus 24 bytes of header, divided across Keys and Values arrays. Typhon has three B+Tree variants — 16-bit, 32-bit, and 64-bit keys — and all three hit exactly 256 bytes with different capacities (38, 29, and 19 keys respectively). Post #1 mentioned 128-byte nodes. We've since moved to 256 bytes after measuring ALP behavior on Zen 4 — capacity went up, lookup latency stayed flat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 2: Eliminate Allocations on Hot Paths
&lt;/h2&gt;

&lt;p&gt;In .NET, every allocation is a future GC event. On hot paths, the cost isn't the allocation itself (~5 ns) — it's the Gen0/Gen1 collection later that pauses unrelated threads. The discipline is simple: allocate nothing in steady state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ref struct&lt;/code&gt; is the primary weapon. A &lt;code&gt;ref struct&lt;/code&gt; lives on the stack, dies when the scope ends, and the GC never knows it existed. Post #1 showed &lt;code&gt;EntityRef&lt;/code&gt; (96 bytes, inline component cache). But ref structs are a systematic discipline in Typhon, not a one-off optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;OlcLatch&lt;/code&gt;&lt;/strong&gt;: wraps a single &lt;code&gt;ref int&lt;/code&gt; — the B+Tree node's version field. The entire optimistic lock coupling protocol (read version, validate, try-write-lock) in a struct that's basically a typed pointer. Allocated millions of times per second during tree traversal, at zero GC cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;EpochGuard&lt;/code&gt;&lt;/strong&gt;: &lt;a href="https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization" rel="noopener noreferrer"&gt;RAII&lt;/a&gt; scope for epoch-based page protection. Enter and exit in 3.3 ns. Because it's a &lt;code&gt;ref struct&lt;/code&gt;, it can't be boxed, captured in a closure, or passed to async code — exactly the constraints you want for a scope guard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;WalClaim&lt;/code&gt;&lt;/strong&gt;: a Write-Ahead Log buffer claim containing a &lt;code&gt;Span&amp;lt;byte&amp;gt;&lt;/code&gt; that points directly into native WAL memory. Can't escape to the heap by construction — the Span field makes it a &lt;code&gt;ref struct&lt;/code&gt; automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PointInTimeAccessor&lt;/code&gt;&lt;/strong&gt;: a reusable snapshot attached to parallel workers. One per worker, stored in a flat array indexed by worker ID. Zero per-entity dictionary overhead — no &lt;code&gt;Dictionary&amp;lt;EntityId, T&amp;gt;&lt;/code&gt; on the hot path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For short-lived buffers, &lt;code&gt;stackalloc&lt;/code&gt; with a threshold pattern: stack-allocate when the array is small (under 64 elements), fall back to the heap otherwise. Most arrays stay small, so they never touch the allocator.&lt;/p&gt;

&lt;p&gt;For larger long-lived buffers, the Pinned Object Heap: &lt;code&gt;GC.AllocateArray&amp;lt;byte&amp;gt;(capacity, pinned: true)&lt;/code&gt;. Pre-zeroed by the OS, never compacted by the GC, stable pointer for direct access. Typhon's HashMap uses this for its entire entry array.&lt;/p&gt;

&lt;p&gt;For medium reusable buffers, &lt;code&gt;ArrayPool&amp;lt;T&amp;gt;.Shared&lt;/code&gt;. FPI compression rents 9 KB buffers, returns them in a &lt;code&gt;finally&lt;/code&gt; block. Query execution rents stream arrays sized for the common case (8 slots), doubles if needed.&lt;/p&gt;

&lt;p&gt;Four strategies — ref struct for scoped access, stackalloc for small temporaries, POH for large long-lived buffers, ArrayPool for medium reusable buffers. The result: zero hot-path allocations in steady state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 3: Reduce Memory Indirections
&lt;/h2&gt;

&lt;p&gt;Every pointer chase is a potential cache miss. An L3 hit costs ~100 cycles, a DRAM miss costs ~200+. The goal: minimize the number of hops from "I want this data" to "here's the data."&lt;/p&gt;

&lt;p&gt;Post #1 showed the flagship example — the &lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;SIMD chunk accessor&lt;/a&gt; with its 3-tier lookup (MRU check, Vector256 search, clock-hand eviction). Each tier reduces indirection compared to the next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Epoch-based page protection&lt;/strong&gt; eliminates another class of indirection. The traditional approach: atomic increment on page access, atomic decrement on release. For N page accesses in a transaction, that's 2N atomic operations — each one a potential cache-line bounce. Typhon uses epoch-based protection instead: one stamp when entering a transaction scope, one clear when exiting. Pages accessed within an active epoch can't be evicted. Cost: 2 operations per transaction, regardless of how many pages are touched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zone maps&lt;/strong&gt; eliminate entire clusters of indirection. Each indexed field maintains per-cluster min/max bounds. A range query like &lt;code&gt;WHERE Level &amp;gt;= 50&lt;/code&gt; checks two integers per cluster — if the cluster's maximum is below 50, skip every entity in it without loading a single component byte. The impact at different selectivities, measured on 100K entities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Selectivity&lt;/th&gt;
&lt;th&gt;Without zone maps&lt;/th&gt;
&lt;th&gt;With zone maps&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;1.3 ms&lt;/td&gt;
&lt;td&gt;10x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;0.65 ms&lt;/td&gt;
&lt;td&gt;21x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;0.16 ms&lt;/td&gt;
&lt;td&gt;84x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;0.05 ms&lt;/td&gt;
&lt;td&gt;268x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The float ordering trick makes this work for non-integer types: an IEEE 754 sign-flip converts floats to a representation where integer comparison order equals numeric order, enabling the same two-comparison interval overlap check regardless of field type.&lt;/p&gt;

&lt;p&gt;At the other end of the scale, division elimination saves cycles on every single chunk lookup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Field: precomputed at segment creation&lt;/span&gt;
&lt;span class="c1"&gt;// Replaces expensive division (~20-80 cycles) with multiply+shift (~3-4 cycles)&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;ulong&lt;/span&gt; &lt;span class="n"&gt;_divMagic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Constructor: compute magic multiplier once&lt;/span&gt;
&lt;span class="n"&gt;_divMagic&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0x1_0000_0000U&lt;/span&gt;&lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;_otherChunkCount&lt;/span&gt; &lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;_otherChunkCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Hot path: every chunk lookup uses this instead of idiv&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;adjusted&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;_divMagic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;adjusted&lt;/span&gt; &lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;pageIndex&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;_otherChunkCount&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integer division (&lt;code&gt;idiv&lt;/code&gt; on x64) is notoriously slow — 20 to 80 cycles depending on operand size. The magic multiplier replaces it with a multiply and a shift: 3-4 cycles. The precomputation happens once when a segment is created; the benefit repeats on every one of the millions of chunk lookups that follow. Six lines of math, 20x speedup on a hot path. This is a classic systems programming trick that most managed-language developers have never needed — but when your per-entity budget is 2.5 nanoseconds, you need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 4: Let the JIT Help
&lt;/h2&gt;

&lt;p&gt;The JIT compiler is your optimization partner, not your enemy. Write code in patterns it can optimize, and it does work for you that you'd have to do manually in C or Rust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrained generics&lt;/strong&gt; give you monomorphization. When you write &lt;code&gt;where TMask : struct, IArchetypeMask&amp;lt;TMask&amp;gt;&lt;/code&gt;, the JIT generates a separate native code path for each concrete type. &lt;code&gt;ArchetypeMask256&lt;/code&gt; (four &lt;code&gt;ulong&lt;/code&gt; fields, bitwise operations) gets fully inlined — no vtable, no virtual dispatch. This is the same optimization Rust gets from generics, but opt-in through the &lt;code&gt;struct&lt;/code&gt; constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;sealed&lt;/code&gt;&lt;/strong&gt; enables devirtualization. &lt;code&gt;DirtyBitmap&lt;/code&gt; and &lt;code&gt;ArchetypeClusterInfo&lt;/code&gt; are both on hot paths and both sealed. The JIT knows no subclass can exist, so it converts virtual calls to direct calls and can inline them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;[AggressiveInlining]&lt;/code&gt;&lt;/strong&gt; eliminates call overhead on micro-operations. B+Tree binary search, transaction state validation, every lock acquire/release — the overhead of a method call (save registers, set up stack frame, restore) is 2-5 ns. On a path called millions of times, that compounds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SoA layout enables auto-vectorization.&lt;/strong&gt; When a cluster is fully occupied (all N slots in use), the iteration loop becomes a simple sequential walk over contiguous SoA arrays with no branches. The JIT can auto-vectorize this on AVX2 — processing 8 floats per SIMD instruction. The SoA layout isn't just about cache locality; it's about giving the JIT a pattern it can vectorize.&lt;/p&gt;

&lt;p&gt;But the most surprising JIT trick is dead-code elimination through &lt;code&gt;static readonly&lt;/code&gt; fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// TelemetryConfig.cs — field declarations&lt;/span&gt;
&lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
&lt;span class="c1"&gt;/// static readonly fields allow the JIT to eliminate disabled telemetry code paths&lt;/span&gt;
&lt;span class="c1"&gt;/// entirely. When a readonly field is false, the JIT treats guarded blocks as dead&lt;/span&gt;
&lt;span class="c1"&gt;/// code and removes them completely in Tier 1 compilation.&lt;/span&gt;
&lt;span class="c1"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;Enabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;EcsEnabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;EcsActive&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// Combined: Enabled &amp;amp;&amp;amp; EcsEnabled&lt;/span&gt;

&lt;span class="c1"&gt;// Static constructor — computed once at startup&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="nf"&gt;TelemetryConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetSection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Typhon:Telemetry"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;Enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Enabled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;EcsEnabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ecsSection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Enabled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;EcsActive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Enabled&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;EcsEnabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// EcsQuery.cs — usage on hot path&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TelemetryConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EcsActive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;activity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TyphonActivitySource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartActivity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ECS.Query.Execute"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;SetTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TyphonSpanAttributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EcsArchetype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TArchetype&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;EcsActive&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt;, the JIT doesn't just short-circuit the branch — it &lt;strong&gt;eliminates the entire &lt;code&gt;if&lt;/code&gt; block&lt;/strong&gt; from the generated native code. No branch instruction, no condition check, zero cost. The &lt;code&gt;static readonly&lt;/code&gt; field, initialized in a static constructor, is treated as a constant after Tier 1 JIT compilation. The dead branch and everything inside it vanish.&lt;/p&gt;

&lt;p&gt;This gives you zero-cost observability. Full OpenTelemetry tracing when enabled; literally nothing — not even a branch — when disabled. Most C# developers don't know the JIT does this. It's worth structuring your telemetry and feature flags around this pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 5: Design for the Hardware
&lt;/h2&gt;

&lt;p&gt;The CPU manual is a requirements document. Cache-line size, SIMD register width, TLB coverage, memory bandwidth — these aren't abstract numbers. They drive struct sizing, batch sizes, and allocation strategy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache-line size (64 bytes on x86, 128 bytes on Apple Silicon)&lt;/strong&gt; drives &lt;code&gt;CacheLinePaddedInt&lt;/code&gt; sizing, B+Tree node alignment, and SoA array alignment. The ViewDeltaRingBuffer aligns each sub-buffer to 64-byte boundaries so that the hardware prefetcher doesn't waste bandwidth loading adjacent unrelated data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SIMD width&lt;/strong&gt; determines batch sizes. Typhon's &lt;code&gt;SimdPredicateEvaluator&lt;/code&gt; uses three-tier CPU dispatch for filtering entities by field values: AVX-512 processes 16 integer comparisons per instruction, AVX2 processes 8, with a scalar fallback for older hardware. The AVX-512 path uses a workaround — .NET doesn't expose 512-bit gather intrinsics, so it performs two 256-bit AVX2 gathers and combines them into a &lt;code&gt;Vector512&lt;/code&gt; for the comparison step. The JIT emits a native &lt;code&gt;vpcmpd&lt;/code&gt; instruction for the 16-wide comparison. On Zen 4 (which double-pumps 512-bit operations), throughput matches two AVX2 iterations but with half the loop overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software prefetch&lt;/strong&gt; hides memory latency where it matters most. During HashMap resize, speculative prefetch computes the &lt;em&gt;future&lt;/em&gt; entry's position in the resized table and issues &lt;code&gt;Sse.Prefetch0&lt;/code&gt; to start loading that cache line while the current entry is being processed. The JIT translates this to a &lt;code&gt;prefetcht0&lt;/code&gt; instruction — essentially free to issue, and it hides 100+ cycles of latency per entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BMI2 instructions&lt;/strong&gt; accelerate spatial indexing. Morton key encoding (Z-order curves) uses &lt;code&gt;Bmi2.ParallelBitDeposit&lt;/code&gt; to interleave X/Y coordinates in ~1 cycle. The scalar fallback costs ~10 cycles. Morton ordering places spatially adjacent grid cells at nearby array indices, improving cache locality during neighbor queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLB coverage&lt;/strong&gt; constrains working set design. Without 2 MB huge pages, x86 L2 TLB covers only 8-12 MB. Every access beyond that risks a 15-20 ns page walk penalty on top of the data access itself. Typhon's cluster storage keeps 100K entities in ~2.5 MB — comfortably within L2 TLB coverage even without huge pages. For larger datasets, the page cache's 8 KB pages and sequential access patterns keep the hardware prefetcher effective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory bandwidth (~50 GB/s on Zen 4)&lt;/strong&gt; is the ceiling for bulk scans. If your SoA component scan isn't approaching this number, something is leaving performance on the table — unnecessary indirection, poor alignment, or branches that defeat the prefetcher.&lt;/p&gt;

&lt;p&gt;All measurements in this post were taken on an AMD Ryzen 9 7950X with .NET 10, BenchmarkDotNet, release configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Individual principles are nice. What matters is how they compound. Here's what the engine actually delivers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cluster iteration (per entity)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CRUD lifecycle (spawn, read, update, destroy, commit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.95 μs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transaction create-read-commit (100 entities)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.6 μs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+Tree point lookup (10K entries)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;191 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component read (1 MVCC version)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;703 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component read (50 MVCC versions)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;720 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uncontended RW lock acquire&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page cache hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunk accessor MRU hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.1 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Epoch enter/exit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.3 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cascade delete 10K entities&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.6 μs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The version invariance number deserves a callout: reading a component with 50 MVCC revisions costs the same as reading one with a single revision. 703 ns vs 720 ns — within measurement noise. The revision chain design works.&lt;/p&gt;

&lt;p&gt;These principles also scale to parallel execution:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workers&lt;/th&gt;
&lt;th&gt;Tick time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;Efficiency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;~37 ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;~18 ms&lt;/td&gt;
&lt;td&gt;2.1x&lt;/td&gt;
&lt;td&gt;104%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;~10 ms&lt;/td&gt;
&lt;td&gt;3.8x&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;~5.3 ms&lt;/td&gt;
&lt;td&gt;7.1x&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;89% parallel efficiency on 8 workers. The 16-worker result (6.7x, 42% efficiency) hits the L3 cache / CCD boundary on the 7950X — a hardware wall, not a software one.&lt;/p&gt;

&lt;p&gt;To put these numbers in perspective, here's the concurrency cost hierarchy that drives Typhon's design decisions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0: Thread-local&lt;/td&gt;
&lt;td&gt;~2 ns&lt;/td&gt;
&lt;td&gt;TLS counter, local variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1: Uncontended atomic&lt;/td&gt;
&lt;td&gt;5-10 ns&lt;/td&gt;
&lt;td&gt;AccessControl read latch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2: Contended atomic&lt;/td&gt;
&lt;td&gt;20-140 ns&lt;/td&gt;
&lt;td&gt;Multiple writers, same lock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3: System call&lt;/td&gt;
&lt;td&gt;500-1000 ns&lt;/td&gt;
&lt;td&gt;Timestamp via syscall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4: Context switch&lt;/td&gt;
&lt;td&gt;~10,000 ns&lt;/td&gt;
&lt;td&gt;Blocking lock, futex wait&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5: Oversubscription&lt;/td&gt;
&lt;td&gt;100,000+ ns&lt;/td&gt;
&lt;td&gt;More threads than cores&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each level is roughly 10x more expensive than the previous one. Typhon's &lt;code&gt;AdaptiveWaiter&lt;/code&gt; (spin → yield → sleep progression) keeps most contention at Level 2, avoiding the 100x jump to Level 4. The cache-line padding from Principle 1 keeps parallel workers from bouncing each other between Level 1 and Level 2. Every design decision maps to staying as low in this hierarchy as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Unsafe is unsafe.&lt;/strong&gt; These techniques require &lt;code&gt;unsafe&lt;/code&gt; code — pointer arithmetic, raw memory access, manual layout control. One bug can corrupt the page cache. Roslyn analyzers catch some classes of errors at compile time, but not all. The safety net has holes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity budget.&lt;/strong&gt; Magic multipliers, SIMD evaluators, epoch-based protection, zone maps — each one is simple in isolation. The combination creates a codebase that demands systems-level understanding to navigate. There's no shortcut around understanding the hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not all of this transfers.&lt;/strong&gt; Most .NET applications don't need microsecond latency. Using &lt;code&gt;CacheLinePaddedInt&lt;/code&gt; in a web API is premature optimization. These techniques are for when you've measured, profiled, and confirmed that memory access patterns are your bottleneck — not before.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The next post dives into concurrency: "Deadlock-Free by Construction: How Typhon Eliminates Deadlocks Instead of Detecting Them." Most databases treat deadlocks as a runtime problem — detect the cycle, abort a transaction, retry. Typhon makes deadlocks structurally impossible through a three-pillar mathematical argument. No detection, no timeouts, no retries.&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>performance</category>
      <category>database</category>
    </item>
    <item>
      <title>What Game Engines Know About Data That Databases Forgot</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Sun, 05 Apr 2026 22:20:39 +0000</pubDate>
      <link>https://dev.to/nockawa/what-game-engines-know-about-data-that-databases-forgot-10m2</link>
      <guid>https://dev.to/nockawa/what-game-engines-know-about-data-that-databases-forgot-10m2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What Game Engines Know About Data That Databases Forgot&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Microsecond Latency in a Managed Language &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;Game servers sit at an uncomfortable intersection. They need the raw throughput of a game engine — tens of thousands of entities updated every tick. But they also need what databases provide: transactions that don't corrupt state, queries that don't scan everything, and durability that survives crashes.&lt;/p&gt;

&lt;p&gt;Today, game server teams pick one side and hack around the other. An &lt;a href="https://en.wikipedia.org/wiki/Entity_component_system" rel="noopener noreferrer"&gt;Entity-Component-System&lt;/a&gt; framework for speed, with manual serialization to a database for persistence. Or a database for safety, with an impedance mismatch every time they touch game state.&lt;/p&gt;

&lt;p&gt;Typhon draws from both traditions. It's a database engine that stores data the way game engines do — and provides the guarantees that game servers need. Here's why those two worlds aren't as far apart as they look.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Fields, One Problem
&lt;/h2&gt;

&lt;p&gt;ECS architecture evolved in game engines. Relational databases evolved in enterprise software. They never talked to each other. But look at what they built:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ECS Concept&lt;/th&gt;
&lt;th&gt;Database Concept&lt;/th&gt;
&lt;th&gt;Shared Principle&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Archetype&lt;/td&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;Homogeneous, fixed-schema storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;Typed, blittable, bulk-iterable data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entity&lt;/td&gt;
&lt;td&gt;Row&lt;/td&gt;
&lt;td&gt;Identity with dynamic composition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System&lt;/td&gt;
&lt;td&gt;Query&lt;/td&gt;
&lt;td&gt;Process all records matching a signature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frame Budget (16ms)&lt;/td&gt;
&lt;td&gt;Latency SLA&lt;/td&gt;
&lt;td&gt;Hard real-time deadline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An ECS "archetype" is a table. A "component" is a column. A "system" is a query. The vocabulary is different, the underlying structure is the same. Two fields, separated by decades and industry boundaries, converged on structurally identical solutions because they were solving the same fundamental problem: managing structured data under performance constraints.&lt;/p&gt;

&lt;p&gt;This convergence is why a synthesis is possible at all. It's not an accident — it's driven by the same physics. Data must be laid out for the CPU cache. Access patterns must be predictable. Latency budgets are real.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned From Game Engines
&lt;/h2&gt;

&lt;p&gt;ECS taught the database world something important about how data should be stored. Three lessons Typhon draws directly from game engine architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache locality by default.&lt;/strong&gt; In a traditional row store, reading all player positions means loading entire rows — names, inventories, health, everything. Most of those bytes are wasted. In ECS, components are stored per type: all positions contiguous, all health values contiguous. Reading 10,000 positions is a linear memory scan where every byte is useful.&lt;/p&gt;

&lt;p&gt;This matters more than most developers realize. An L1 cache hit costs roughly 1 nanosecond. A DRAM miss costs 60-70 ns — a &lt;strong&gt;65x penalty&lt;/strong&gt;. When your database layout forces cache misses, no amount of algorithmic cleverness can save you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/typhon-ecs-vs-rowstore.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Ftyphon-ecs-vs-rowstore.png" alt="Storage layout comparison — traditional row store vs Typhon's component store" width="800" height="1135"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero-copy is the default, not the optimization.&lt;/strong&gt; In a traditional database, reading a record means deserializing from a storage page into a language-level object. In ECS, a component is already in memory in its final layout — you just hand back a pointer. Typhon preserves this: components are blittable &lt;code&gt;unmanaged&lt;/code&gt; structs read directly from pinned memory pages. No serialization, no managed heap allocation, no GC involvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entity as pure identity.&lt;/strong&gt; In ECS, an entity is just an ID — a 64-bit number with no inherent structure. All data lives externally in component tables. This is the opposite of ORM thinking where the object &lt;em&gt;is&lt;/em&gt; the entity. Typhon inherits this: &lt;code&gt;EntityId&lt;/code&gt; is a lightweight value type, all state lives in typed component storage. This separation is what makes the rest of the architecture possible — per-component versioning, per-component storage modes, independent indexes per component type.&lt;/p&gt;
&lt;h2&gt;
  
  
  What We Learned From Databases
&lt;/h2&gt;

&lt;p&gt;Traditional databases solved problems that ECS never had to face. Four capabilities Typhon draws from database architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ACID transactions with per-component MVCC.&lt;/strong&gt; Game engines typically have no isolation. Two systems modifying the same entity in the same tick is a race condition — and in a single-process game, you control the execution order so you can manage it. On a game server with concurrent player sessions, you can't.&lt;/p&gt;

&lt;p&gt;Databases solved this decades ago with MVCC: snapshot isolation where readers never block writers, with conflict detection at commit time. Typhon brings this in — but with a twist. Traditional databases version entire rows. Typhon versions each component independently. An entity's &lt;code&gt;PositionComponent&lt;/code&gt; and &lt;code&gt;InventoryComponent&lt;/code&gt; each maintain their own revision chain: a circular buffer of 12-byte revision entries, each stamped with a 48-bit transaction sequence number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified: finding the visible revision for a snapshot&lt;/span&gt;
&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;WalkRevisions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsolationFlag&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TSN&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;myTransactionTSN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Skip uncommitted revisions from other transactions&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TSN&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;snapshotTSN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Most recent revision visible to our snapshot&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means a transaction reading a player's position sees a consistent frozen point-in-time across &lt;em&gt;all&lt;/em&gt; component types simultaneously — without locking any of them. Writers never block readers. And because revisions are per-component rather than per-entity, updating a player's position doesn't create a new version of their inventory. Less data copied, less garbage to collect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indexed selective access.&lt;/strong&gt; This is the big one. ECS systems iterate &lt;em&gt;everything&lt;/em&gt; matching a component signature every tick. That works brilliantly for particle simulations where every particle needs updating. But game servers often don't need all of them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Total Entities&lt;/th&gt;
&lt;th&gt;Processed Per Tick&lt;/th&gt;
&lt;th&gt;Useful Work&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Battle royale (per-client relevancy)&lt;/td&gt;
&lt;td&gt;50,000 actors&lt;/td&gt;
&lt;td&gt;500–2,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1–4%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MMO area of interest&lt;/td&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;200–1,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.2–1%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physics (awake bodies only)&lt;/td&gt;
&lt;td&gt;All rigidbodies&lt;/td&gt;
&lt;td&gt;Awake subset&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5–20%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you're processing 1–4% of your entities, scanning everything is doing 25–100x more work than necessary. ECS frameworks recognized this — Unity DOTS added enableable components, Flecs added &lt;code&gt;group_by&lt;/code&gt;, Unreal MassEntity added LOD tiers. These are all clever workarounds for the same underlying issue: ECS was designed for bulk iteration, not selective access.&lt;/p&gt;

&lt;p&gt;Databases solved this with indexes. B+Trees for value-based lookups, spatial trees for area-of-interest queries, selectivity estimation to decide when to scan versus when to seek. Typhon brings these into the component storage model — not as bolted-on workarounds, but as first-class citizens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spatial partitioning.&lt;/strong&gt; For spatial access patterns specifically — the #1 selective access need in game servers — Typhon integrates a two-layer spatial index directly into the component storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1: Sparse hash map&lt;/strong&gt; — maps coarse grid cells to entity counts. O(1) rejection of empty regions before the tree is even touched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2: Page-backed R-Tree&lt;/strong&gt; — AABB, radius, ray, frustum, and kNN queries. Same OLC-latched, SOA node architecture as the B+Trees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both layers run inside the same transactional model as everything else. No external spatial hash bolted on alongside your ECS. No cache locality destroyed by chasing pointers into a separate data structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Durability.&lt;/strong&gt; A game client can afford to lose state on crash — reload the level. A game server cannot. Player inventories, economy state, progression data — all must survive process restarts and crashes. WAL-based crash recovery, checkpointing, configurable fsync — these are database fundamentals that game servers need but ECS frameworks never provided.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query planning.&lt;/strong&gt; When you have both indexes and sequential storage, someone needs to decide which access path to use. Databases have decades of work on cost-based query optimization — selectivity estimation, histogram statistics, index selection. Typhon brings a query planner into the ECS world: given a predicate on a component field, it automatically chooses full scan or B+Tree seek based on estimated selectivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Purpose-Built for Game Servers
&lt;/h2&gt;

&lt;p&gt;Typhon doesn't glue ECS and database concepts together with duct tape. It synthesizes them into a single model designed for game server workloads.&lt;/p&gt;

&lt;p&gt;A component in Typhon is simultaneously an ECS component and a database schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Component&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;PlayerComponent&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;String64&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;                    &lt;span class="c1"&gt;// B+Tree for fast lookups&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;AccountId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;Experience&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Blittable, unmanaged, fixed-size, stored contiguously per type — that's the ECS side. Typed fields with automatic B+Tree indexes on marked fields — that's the database side. One declaration, both worlds.&lt;/p&gt;

&lt;p&gt;The query API makes the synthesis concrete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;topPlayers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Player&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OrderByDescending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ExecuteOrdered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ECS-style typed component access. Database-style predicate filtering with automatic index selection. Inside a transaction with snapshot isolation. The query planner chooses scan vs B+Tree based on selectivity — the developer doesn't have to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/typhon-query-flow.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Ftyphon-query-flow.png" alt="How a typed query flows through Typhon — from lambda expression to archetype mask filtering, selectivity estimation, and component reads" width="782" height="1348"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And because game servers have different durability needs for different operations, Typhon lets you choose per unit of work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Position ticks: game-engine speed, batched durability&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;uow&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateUnitOfWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DurabilityMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deferred&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Legendary item drop: database safety, immediate fsync&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;uow&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateUnitOfWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DurabilityMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Immediate&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same engine, same API. &lt;code&gt;Deferred&lt;/code&gt; mode gives game-engine-class commit latency for position updates that can be re-simulated on crash. &lt;code&gt;Immediate&lt;/code&gt; mode gives database-class guarantees for a transaction that grants a rare item worth real money. The game server decides per operation — not globally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage Modes: Not All Data Is Equal
&lt;/h3&gt;

&lt;p&gt;A game server doesn't treat all data the same. Player positions change 60 times per second and can be re-simulated on crash. Inventory mutations are rare but must never be lost. AI runtime state — current targets, threat scores, pathfinding waypoints — is recomputed every tick and worthless after a restart.&lt;/p&gt;

&lt;p&gt;Traditional databases treat all data identically. Traditional ECS keeps everything in memory with no durability distinction. Typhon lets you choose per component type:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;MVCC History&lt;/th&gt;
&lt;th&gt;Persisted&lt;/th&gt;
&lt;th&gt;Change Tracking&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Versioned&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full revision chains&lt;/td&gt;
&lt;td&gt;Yes (WAL + checkpoint)&lt;/td&gt;
&lt;td&gt;Via MVCC&lt;/td&gt;
&lt;td&gt;Inventory, economy, progression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SingleVersion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current state only&lt;/td&gt;
&lt;td&gt;Yes (WAL + checkpoint)&lt;/td&gt;
&lt;td&gt;DirtyBitmap&lt;/td&gt;
&lt;td&gt;Positions, health, frequently-updated state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transient&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current state only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;DirtyBitmap&lt;/td&gt;
&lt;td&gt;AI blackboard, threat scores, pathfinding scratch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SingleVersion components skip the revision chain overhead entirely — no circular buffer, no per-write allocation. They track changes through a DirtyBitmap instead: one bit per entity, flipped on write, scanned on tick fence. This is how game engines track what changed, and it's the right model for data that updates every tick.&lt;/p&gt;

&lt;p&gt;Versioned components get full MVCC with snapshot isolation — readers see consistent historical state, writers don't block readers, conflicts are detected at commit time. This is how databases protect critical data, and it's the right model for things that must never be corrupted.&lt;/p&gt;

&lt;p&gt;Transient components never touch disk at all — no WAL, no checkpoint, no recovery. Pure in-memory storage with the same query and indexing API as everything else. AI blackboard data that's recomputed every tick has no business paying persistence overhead.&lt;/p&gt;

&lt;p&gt;The same engine, the same transaction API, but the storage layer does exactly what each component type needs. This is what "purpose-built for game servers" means in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Views: The Bridge Between ECS Systems and Database Queries
&lt;/h3&gt;

&lt;p&gt;In ECS, a "system" runs every tick, processing all matching entities. In a database, a "materialized view" maintains a cached result set and refreshes it incrementally. Typhon's Views are both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ItemData&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rarity&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToView&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Game loop&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;running&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateQuickTransaction&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Refresh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Microsecond incremental refresh&lt;/span&gt;

    &lt;span class="c1"&gt;// React to changes — like an ECS system, but only for what changed&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetDelta&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Added&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="nf"&gt;SpawnVisual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;DespawnVisual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Modified&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;UpdateVisual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ClearDelta&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The initial &lt;code&gt;ToView()&lt;/code&gt; runs a full query. After that, &lt;code&gt;Refresh()&lt;/code&gt; drains a lock-free ring buffer of changes pushed by the commit path — only entities whose indexed fields actually changed are re-evaluated. If 100,000 entities match your view but only 12 changed since last refresh, you do 12 evaluations, not 100,000.&lt;/p&gt;

&lt;p&gt;This is the iterate-everything problem solved from the database side: don't re-scan, track deltas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;Specializing for game servers means giving things up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blittable components only.&lt;/strong&gt; No &lt;code&gt;string&lt;/code&gt;, no object references, no variable-length arrays inside components. Text uses fixed-size types like &lt;code&gt;String64&lt;/code&gt;. This is the price of zero-copy reads and cache-friendly storage — and it's a constraint game developers are already familiar with from ECS frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entity-centric relationships, not SQL JOINs.&lt;/strong&gt; Typhon supports navigation links, 1:N and N:M relationships — but they follow entity references, closer to a graph database than a traditional SQL one. This matches how game servers naturally think about data (an entity &lt;em&gt;has&lt;/em&gt; components, a guild &lt;em&gt;contains&lt;/em&gt; members), but if your mental model is &lt;code&gt;SELECT ... FROM a JOIN b ON a.x = b.y&lt;/code&gt;, it's a different paradigm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema in code, not SQL.&lt;/strong&gt; Components are C# structs with attributes, not DDL statements. Natural for game developers, unfamiliar territory for database administrators. If your team thinks in SQL, this is a paradigm shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll go deeper into the performance philosophy that makes all of this actually fast — data-oriented design, cache-line awareness, and zero-allocation hot paths. The principles that let a managed language hit microsecond-latency transactions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you want to follow along, the best way is to star &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;the repo&lt;/a&gt; or subscribe to the &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS feed&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>database</category>
      <category>ecs</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>Why I'm Building a Database Engine in C#</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Sun, 29 Mar 2026 13:02:55 +0000</pubDate>
      <link>https://dev.to/nockawa/why-im-building-a-database-engine-in-c-1np0</link>
      <guid>https://dev.to/nockawa/why-im-building-a-database-engine-in-c-1np0</guid>
      <description>&lt;p&gt;When I tell people I'm building an ACID database engine in C#, the first reaction is always the same: &lt;em&gt;"But what about GC pauses?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's a fair question. Nobody builds high-performance database engines in .NET. The assumption is that you need C, C++, or Rust for this class of software — that managed languages are fundamentally disqualified from the microsecond-latency club.&lt;/p&gt;

&lt;p&gt;After 30 years of building real-time 3D engines and systems software, I chose C# anyway. The project is called &lt;strong&gt;Typhon&lt;/strong&gt;: an embedded ACID database engine targeting 1–2 microsecond transaction commits. And the reasons behind that choice might change how you think about what C# can do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Case Against C# (Let's Steel-Man It)
&lt;/h2&gt;

&lt;p&gt;Before I make my case, let me honestly lay out every argument against choosing C# for this. These are real concerns, not strawmen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GC is non-deterministic.&lt;/strong&gt; It can pause all your threads whenever it wants. For a database engine that promises microsecond latency, a 10ms Gen2 collection is catastrophic — that's 10,000x your latency budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You don't control memory layout.&lt;/strong&gt; The managed heap decides where objects live. The GC can move them around during compaction. You can't guarantee that your B+Tree nodes sit on cache-line boundaries, or that your page cache buffer won't get relocated mid-transaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JIT warmup is real.&lt;/strong&gt; The first call to any method pays the compilation cost. In a database engine, the first transaction after startup shouldn't be 100x slower than the steady state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Virtual dispatch and bounds checking add overhead.&lt;/strong&gt; Every array access has a hidden bounds check. Every interface call goes through a vtable. In a hot loop processing millions of entities, these nanoseconds compound.&lt;/p&gt;

&lt;p&gt;These are all legitimate problems. I won't pretend they aren't. But here's what most people miss: &lt;strong&gt;modern C# has answers for every single one of them.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Most People Don't Know About C
&lt;/h2&gt;

&lt;p&gt;The C# that most developers know — classes, garbage collection, LINQ — is only half the language. There's a whole other side that the .NET runtime team has been quietly building for a decade, and it looks nothing like what you'd expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;unsafe&lt;/code&gt; gives you C-level control.&lt;/strong&gt; Raw pointers, pointer arithmetic, &lt;code&gt;stackalloc&lt;/code&gt; for stack buffers, &lt;code&gt;fixed&lt;/code&gt;-size arrays — the JIT generates the same &lt;code&gt;mov&lt;/code&gt;/&lt;code&gt;cmp&lt;/code&gt;/&lt;code&gt;jne&lt;/code&gt; instructions you'd get from C. Not "close to C." The same instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;GCHandle.Alloc(Pinned)&lt;/code&gt; makes the GC irrelevant where it matters.&lt;/strong&gt; You can pin byte arrays so the GC never moves them. Typhon's entire page cache is pinned memory — the GC doesn't touch it, doesn't scan it, doesn't move it. It's just raw bytes at a fixed address, exactly like &lt;code&gt;malloc&lt;/code&gt; in C.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/ref-struct" rel="noopener noreferrer"&gt;&lt;code&gt;ref struct&lt;/code&gt;&lt;/a&gt; eliminates heap allocations on hot paths.&lt;/strong&gt; A &lt;code&gt;ref struct&lt;/code&gt; can never escape to the heap. It lives on the stack, dies when the scope ends, and the GC never knows it existed. Typhon's entity accessor (&lt;code&gt;EntityRef&lt;/code&gt;) is a 96-byte &lt;code&gt;ref struct&lt;/code&gt; — zero allocation, zero GC pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrained generics give you true monomorphization.&lt;/strong&gt; When you write &lt;code&gt;where T : unmanaged&lt;/code&gt;, the JIT generates a separate native code path for each type parameter. &lt;code&gt;sizeof(T)&lt;/code&gt; becomes a constant. Dead branches get eliminated. It's the same optimization Rust gets from generics — not a runtime dispatch, but compile-time specialization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware intrinsics are first-class.&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/dotnet/api/system.runtime.intrinsics" rel="noopener noreferrer"&gt;&lt;code&gt;System.Runtime.Intrinsics&lt;/code&gt;&lt;/a&gt; gives you &lt;code&gt;Vector256&lt;/code&gt;, &lt;code&gt;Sse42.Crc32&lt;/code&gt;, &lt;code&gt;BitOperations.TrailingZeroCount&lt;/code&gt; — the same SIMD instructions available in C/C++, with the same performance, and runtime feature detection so you can fall back gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;[StructLayout(Explicit)]&lt;/code&gt; gives you exact memory layout.&lt;/strong&gt; Field offsets, padding, size — you control every byte. Cache-line alignment, false-sharing prevention, bit-packing — it's all there.&lt;/p&gt;

&lt;p&gt;This isn't "C# trying to be C." It's C# providing a genuine systems programming layer on top of a best-in-class managed ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Typhon Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/typhon-blog-architecture.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Ftyphon-blog-architecture.png" alt="Typhon Engine architecture — five layers from API to Concurrency, with components discussed in this post highlighted with ★" width="800" height="1418"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Theory is nice, now let's look at real code.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hardware-accelerated WAL checksums
&lt;/h3&gt;

&lt;p&gt;Every page written to the Write-Ahead Log needs a CRC32C checksum. Here's what that looks like in C# — calling CPU instructions by name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="nf"&gt;ComputePartial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReadOnlySpan&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeSse42X64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeSse42X32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ArmCrc32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Arm64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeArm64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeSoftware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="nf"&gt;ComputeSse42X64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReadOnlySpan&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;ulong&lt;/span&gt; &lt;span class="n"&gt;crc64&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt; &lt;span class="n"&gt;ptr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;MemoryMarshal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;aligned&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;~&lt;/span&gt;&lt;span class="m"&gt;7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;aligned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;crc64&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Crc32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadUnaligned&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;ulong&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
        &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;crc32&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;crc64&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;crc32&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Crc32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;++;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;crc32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Sse42.X64.Crc32()&lt;/code&gt; compiles to a single x86 &lt;code&gt;crc32&lt;/code&gt; instruction. The runtime detects the CPU capabilities, the JIT eliminates the dead branches, and what executes is the same code a C programmer would write — but with automatic fallback on platforms without SSE4.2. Result: &lt;strong&gt;~1.3 µs per 8 KB page&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The SIMD chunk accessor
&lt;/h3&gt;

&lt;p&gt;This is Typhon's page cache hot path — a 16-slot cache that finds your data in one of three tiers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// === ULTRA FAST PATH: MRU check ===&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;mru&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_mruSlot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_pageIndices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mru&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;headerOffset&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;_rootHeaderOffset&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_otherHeaderOffset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;*)&lt;/span&gt;&lt;span class="n"&gt;_baseAddresses&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mru&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;headerOffset&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;_stride&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// === FAST PATH: SIMD search through all 16 cached slots ===&lt;/span&gt;
&lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;indices&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_pageIndices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;v0&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;mask0&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ExtractMostSignificantBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask0&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BitOperations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrailingZeroCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;GetFromSlot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirty&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;mask1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ExtractMostSignificantBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask1&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;BitOperations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrailingZeroCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;GetFromSlot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirty&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;_pageIndices&lt;/code&gt; array is a &lt;code&gt;fixed int[16]&lt;/code&gt; — 64 bytes, one cache line, packed for SIMD. One &lt;code&gt;Vector256.Equals&lt;/code&gt; compares 8 page indices in a single instruction. The MRU fast path handles the common case (repeated access to the same page) with a single branch — branch predictor friendly, near-zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-copy entity reads
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;EntityRef&lt;/code&gt; is a &lt;code&gt;ref struct&lt;/code&gt; — stack-only, 96 bytes, with an inline fixed array caching component locations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;EntityRef&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;EntityId&lt;/span&gt; &lt;span class="n"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ArchetypeMetadata&lt;/span&gt; &lt;span class="n"&gt;_archetype&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ArchetypeEngineState&lt;/span&gt; &lt;span class="n"&gt;_engineState&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;Transaction&lt;/span&gt; &lt;span class="n"&gt;_tx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="kt"&gt;ushort&lt;/span&gt; &lt;span class="n"&gt;_enabledBits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;_writable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;_locations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;  &lt;span class="c1"&gt;// inline component chunk IDs&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MethodImpl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MethodImplOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AggressiveInlining&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;Read&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;Comp&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;comp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;unmanaged&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;byte&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_archetype&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetSlot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_componentTypeId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;chunkId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_locations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_engineState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SlotToComponentTable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;_tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadEcsComponentData&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunkId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;Read&amp;lt;T&amp;gt;&lt;/code&gt; call goes from method call → slot lookup → chunk ID → page cache → pointer arithmetic → &lt;code&gt;ref readonly T&lt;/code&gt; pointing directly into a pinned memory page. Zero copies. Zero allocations. Zero GC involvement. The &lt;code&gt;where T : unmanaged&lt;/code&gt; constraint means the JIT knows the exact layout — it compiles to pointer arithmetic, nothing more.&lt;/p&gt;

&lt;h3&gt;
  
  
  JIT-specialized hash functions
&lt;/h3&gt;

&lt;p&gt;Even the hash functions exploit the JIT. Since &lt;code&gt;sizeof(TKey)&lt;/code&gt; is a compile-time constant for constrained generics, the dead branches vanish:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MethodImpl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MethodImplOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AggressiveInlining&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;ComputeHash&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;TKey&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;unmanaged&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;FastHash32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;As&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;XxHash32_8Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;As&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;XxHash32_Bytes&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;*)&lt;/span&gt;&lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AsPointer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you call &lt;code&gt;ComputeHash&amp;lt;int&amp;gt;(42)&lt;/code&gt;, the JIT generates &lt;em&gt;just&lt;/em&gt; the 4-byte path. The other two branches are completely eliminated. This is real monomorphization, not runtime dispatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Productivity Argument
&lt;/h2&gt;

&lt;p&gt;A database engine is more than its hot path. Around the core engine sits a large shell of infrastructure: configuration management, structured logging, telemetry, dependency injection, testing, benchmarking.&lt;/p&gt;

&lt;p&gt;In C or Rust, you'd build much of this yourself or stitch together crates/libraries with varying quality. In .NET, this is production-grade and free: &lt;code&gt;ILogger&lt;/code&gt; and &lt;a href="https://opentelemetry.io/docs/languages/net/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; for observability, &lt;a href="https://github.com/dotnet/BenchmarkDotNet" rel="noopener noreferrer"&gt;BenchmarkDotNet&lt;/a&gt; for rigorous micro-benchmarks, NUnit for testing, &lt;code&gt;IConfiguration&lt;/code&gt; for settings. All well-documented, all interoperable, all maintained by Microsoft or battle-tested OSS communities.&lt;/p&gt;

&lt;p&gt;For a solo developer building a database engine, this is a genuine competitive advantage. I spend my time on concurrency primitives and page cache eviction, not on reinventing a logging framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's the Memory Layout, Not the Language
&lt;/h2&gt;

&lt;p&gt;Here's the insight that years of real-time 3D engines taught me: &lt;strong&gt;the bottleneck in a database engine is memory access patterns, not instruction throughput.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A cache miss to DRAM on a Ryzen 7950X costs 61–73 nanoseconds. That's ~250 CPU cycles doing &lt;em&gt;nothing&lt;/em&gt;, waiting for data. A CAS operation hitting L1 costs 1.4 nanoseconds. The ratio is &lt;strong&gt;50:1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No amount of "zero-cost abstractions" in your language can save you if your data structures cause cache misses. Conversely, if your data layout is cache-friendly — contiguous, aligned, predictable access patterns — the language barely matters. C# with &lt;code&gt;unsafe&lt;/code&gt; generates identical machine code to C on hot paths. The JIT is that good.&lt;/p&gt;

&lt;p&gt;What matters is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cache-line awareness&lt;/strong&gt;: Typhon's B+Tree nodes are 128 bytes — two cache lines. The stride prefetcher on Zen4 covers the second line automatically. This alone cut insert latency by &lt;strong&gt;53%&lt;/strong&gt; and lookup latency by &lt;strong&gt;30%&lt;/strong&gt; versus 64-byte nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data-oriented design&lt;/strong&gt;: Structure of Arrays over Array of Structures. SIMD-friendly layouts. Blittable types only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimizing indirections&lt;/strong&gt;: Every pointer chase is a potential cache miss. The SIMD chunk accessor's MRU hit avoids the chase entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The language you write in matters far less than the memory layout you design.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;All measurements on a Ryzen 9 7950X, .NET 10.0, BenchmarkDotNet, release configuration.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CRUD lifecycle MVCC (spawn, read, update, destroy, commit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.2 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;830K ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90 reads/10 updates workload (100 ops per tx, MVCC)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~4.5M entity-ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+Tree lookup (hit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;267 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.7M ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+Tree sequential scan (per key)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.1 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;479M keys/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uncontended lock acquire&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.8 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128M ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page cache hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.3 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Context: an uncontended CAS on Zen4 costs 1.4 ns. A DRAM round-trip costs 61–73 ns. Typhon's lock acquire (7.8 ns) is about 5 CAS operations — tight, considering it handles shared/exclusive arbitration with waiter tracking. The 267 ns B+Tree lookup implies 6–7 memory accesses, which matches a tree traversal through L2/L3 cache.&lt;/p&gt;

&lt;p&gt;These are early alpha numbers. There's room to improve. But they validate the core thesis: &lt;strong&gt;C# is not the bottleneck.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;No choice is without cost. Here's what I'd tell someone considering the same path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory safety is on you.&lt;/strong&gt; In &lt;code&gt;unsafe&lt;/code&gt; blocks, you can corrupt memory, dereference bad pointers, overflow buffers — the compiler won't save you. &lt;a href="https://learn.microsoft.com/en-us/dotnet/api/system.span-1" rel="noopener noreferrer"&gt;&lt;code&gt;Span&amp;lt;T&amp;gt;&lt;/code&gt;&lt;/a&gt; is a slightly slower but totally safe alternative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GC hasn't been a problem — but it could be.&lt;/strong&gt; By pinning the page cache and using &lt;code&gt;ref struct&lt;/code&gt; on hot paths, Gen2 collections are rare and cheap. But I won't pretend this is guaranteed. A workload that allocates heavily in managed code between transactions could still see pauses. The answer is discipline: &lt;strong&gt;don't allocate on hot paths&lt;/strong&gt;. The language lets you — it just doesn't force you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"But Rust would give you compile-time safety."&lt;/strong&gt; True — the borrow checker catches ownership and lifetime bugs that &lt;code&gt;unsafe&lt;/code&gt; C# can't. But C# has a trick Rust doesn't: &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/tutorials/how-to-write-csharp-analyzer-code-fix" rel="noopener noreferrer"&gt;Roslyn analyzers&lt;/a&gt;&lt;/strong&gt;. I wrote a custom analyzer suite (TYPHON001–007) that enforces domain-specific safety rules as compiler errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[NoCopy]&lt;/code&gt; attribute + analyzer: performance-critical structs like &lt;code&gt;ChunkAccessor&lt;/code&gt; &lt;strong&gt;cannot be passed by value&lt;/strong&gt; — the compiler errors if you forget &lt;code&gt;ref&lt;/code&gt;. This is the same guarantee Rust's borrow checker gives for move semantics, but scoped to the types that actually matter.&lt;/li&gt;
&lt;li&gt;Ownership tracking: if you create a &lt;code&gt;ChunkAccessor&lt;/code&gt; or &lt;code&gt;Transaction&lt;/code&gt; and don't dispose it, that's a &lt;strong&gt;compiler error&lt;/strong&gt; — not a runtime leak. The analyzer tracks ownership transfers through assignments, returns, and &lt;code&gt;ref&lt;/code&gt;/&lt;code&gt;out&lt;/code&gt; parameters, &lt;code&gt;[return: TransfersOwnership]&lt;/code&gt; on a method helps to express ownership transfer for the analyzer to act accordingly.&lt;/li&gt;
&lt;li&gt;Disposal completeness: if your type holds a critical disposable field and your &lt;code&gt;Dispose()&lt;/code&gt; method misses it or has an early return that skips it — compiler error.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is a compile-time error in Typhon — TYPHON001&lt;/span&gt;
&lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChunkAccessor&lt;/span&gt; &lt;span class="n"&gt;accessor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// ✗ Error: must be passed by ref&lt;/span&gt;

&lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ChunkAccessor&lt;/span&gt; &lt;span class="n"&gt;accessor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// ✓ OK&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't get Rust's safety for free in C#. But you can &lt;strong&gt;build the exact subset you need&lt;/strong&gt; as compiler errors, tailored to your domain. And unlike Rust's borrow checker, these rules carry domain context in the diagnostics: "causes page cache deadlock" is more actionable than "value moved here."&lt;/p&gt;

&lt;p&gt;Rust's ecosystem for the surrounding infrastructure (logging, DI, configuration, testing) is also less mature than .NET's, and as a solo developer, my velocity matters. I chose the language where I ship faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JIT warmup is real but manageable.&lt;/strong&gt; The first few transactions after cold start are slower. For an embedded engine (no separate server process), this is acceptable — the host application typically has its own warmup. For a server database, you'd want tiered compilation or AOT.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll explain why an ACID database engine borrows its storage architecture from game engines — specifically the Entity-Component-System pattern. Game engines and databases are solving the same fundamental problem: managing structured data with extreme performance constraints. They just evolved completely different solutions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you want to follow along, the best way is to star &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;the repo&lt;/a&gt; or subscribe to the &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS feed&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Post #1 in a series about building a database engine in C#. Next up: "What Game Engines Know About Data That Databases Forgot".&lt;/em&gt;&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>database</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
