<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Loïc Baumann</title>
    <description>The latest articles on DEV Community by Loïc Baumann (@nockawa).</description>
    <link>https://dev.to/nockawa</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3848296%2F0d77dd3c-3f8e-4b0a-b82b-bf2ef9946225.png</url>
      <title>DEV Community: Loïc Baumann</title>
      <link>https://dev.to/nockawa</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nockawa"/>
    <language>en</language>
    <item>
      <title>Microsecond Latency in a Managed Language: The Performance Philosophy Behind Typhon</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Sun, 12 Apr 2026 20:58:24 +0000</pubDate>
      <link>https://dev.to/nockawa/microsecond-latency-in-a-managed-language-the-performance-philosophy-behind-typhon-1ob8</link>
      <guid>https://dev.to/nockawa/microsecond-latency-in-a-managed-language-the-performance-philosophy-behind-typhon-1ob8</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/what-game-engines-know-about-data/" rel="noopener noreferrer"&gt;What Game Engines Know About Data That Databases Forgot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsecond Latency in a Managed Language&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Deadlock-Free by Construction &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.githubassets.com%2Fimages%2Ficons%2Femoji%2Foctocat.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.githubassets.com%2Fimages%2Ficons%2Femoji%2Foctocat.png" alt="Octocat" height="64" width="64"&gt;&lt;/a&gt; &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;  •  📬 &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;Subscribe via RSS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first two posts in this series covered the &lt;em&gt;why&lt;/em&gt; and the &lt;em&gt;what&lt;/em&gt;. Why C# for a database engine. What happens when you combine ECS storage with database guarantees.&lt;/p&gt;

&lt;p&gt;This post is the &lt;em&gt;how&lt;/em&gt;. Specifically: the five design principles that guide every performance decision in Typhon. Not a bag of tricks — a philosophy. Individual optimizations come and go as the engine evolves, but these principles are stable. They're what let a managed language deliver sub-microsecond transaction latency.&lt;/p&gt;

&lt;p&gt;When your tick budget is 16 milliseconds and you have 100,000 entities to process, every nanosecond of per-entity cost matters. And most of that cost comes from decisions made at design time, not runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 1: Control Memory Layout
&lt;/h2&gt;

&lt;p&gt;Performance starts at the struct definition, not the algorithm. If your data layout causes cache misses, no algorithm can save you.&lt;/p&gt;

&lt;p&gt;The most dramatic example: Typhon recently moved from per-entity hash-table lookups to cluster-based Structure of Arrays (SoA) storage. Same data, same queries, different memory layout. Measured on a Ryzen 9 7950X:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;ns / entity&lt;/th&gt;
&lt;th&gt;vs baseline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard EntityAccessor&lt;/td&gt;
&lt;td&gt;139 ns&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ArchetypeAccessor (cached)&lt;/td&gt;
&lt;td&gt;94 ns&lt;/td&gt;
&lt;td&gt;1.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cluster iteration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;55x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a 55x improvement from changing memory layout alone. The reason: clusters pack N entities (8 to 64, auto-computed per archetype) in contiguous SoA memory. All positions together, all health values together. Every cache line the CPU loads is 100% useful data. For 100K entities, the working set dropped from scattered L3/DRAM access to ~2.5 MB that fits entirely in L2 cache — and L2 is 3x faster than L3 on Zen 4.&lt;/p&gt;

&lt;p&gt;The cluster size isn't a magic constant. An auto-tuning algorithm evaluates every N from 8 to 64 and picks the one that maximizes entities per 8 KB page for a given archetype's component schema. Non-power-of-2 sizes often pack better: N=14 can yield 28 entities per page vs N=16 yielding only 16. The capacity is derived from the data, not from convention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;False sharing&lt;/strong&gt; is the other side of layout control. When multiple threads write to adjacent fields, the CPU bounces the shared cache line between cores — a 40-60 cycle penalty per bounce. Typhon wraps mutable per-thread state in 64-byte padded structs. The WAL commit buffer goes further: explicit padding fields isolating the producer's &lt;code&gt;_tailPosition&lt;/code&gt; and the consumer's &lt;code&gt;_drainPosition&lt;/code&gt; onto separate cache lines. Seven unused &lt;code&gt;long&lt;/code&gt; fields between them, suppressed with &lt;code&gt;#pragma warning&lt;/code&gt;, because the correct layout matters more than the linter's opinion.&lt;/p&gt;

&lt;p&gt;The same hardware awareness drives B+Tree node sizing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;StructLayout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LayoutKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sequential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Pack&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Index32Chunk&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 256 bytes — fills four cache lines. Adjacent Line Prefetcher (ALP) on&lt;/span&gt;
    &lt;span class="c1"&gt;// Zen 4+/recent Intel automatically fetches paired 64-byte lines within&lt;/span&gt;
    &lt;span class="c1"&gt;// 128-byte regions, so two ALP triggers cover the full node.&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;29&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Control&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;OlcVersion&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// bit 0 = locked, bit 1 = obsolete, bits 2-31 = version&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;PrevChunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;NextChunk&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;LeftValue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;HighKey&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;// B-link upper bound&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Values&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Capacity&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;  &lt;span class="c1"&gt;// 29 × 4 = 116 bytes&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Capacity&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;    &lt;span class="c1"&gt;// 29 × 4 = 116 bytes&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This struct is exactly 256 bytes because of the CPU's prefetcher. The Adjacent Line Prefetcher on modern x86 fetches paired 64-byte lines within 128-byte aligned regions — so two ALP triggers cover the full node. A 256-byte node costs effectively the same as a 128-byte node in terms of memory access, but holds nearly twice the keys.&lt;/p&gt;

&lt;p&gt;The capacity of 29 keys isn't a round number because it isn't derived from the algorithm. It's derived from the hardware: 256 bytes of budget minus 24 bytes of header, divided across Keys and Values arrays. Typhon has three B+Tree variants — 16-bit, 32-bit, and 64-bit keys — and all three hit exactly 256 bytes with different capacities (38, 29, and 19 keys respectively). Post #1 mentioned 128-byte nodes. We've since moved to 256 bytes after measuring ALP behavior on Zen 4 — capacity went up, lookup latency stayed flat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 2: Eliminate Allocations on Hot Paths
&lt;/h2&gt;

&lt;p&gt;In .NET, every allocation is a future GC event. On hot paths, the cost isn't the allocation itself (~5 ns) — it's the Gen0/Gen1 collection later that pauses unrelated threads. The discipline is simple: allocate nothing in steady state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ref struct&lt;/code&gt; is the primary weapon. A &lt;code&gt;ref struct&lt;/code&gt; lives on the stack, dies when the scope ends, and the GC never knows it existed. Post #1 showed &lt;code&gt;EntityRef&lt;/code&gt; (96 bytes, inline component cache). But ref structs are a systematic discipline in Typhon, not a one-off optimization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;OlcLatch&lt;/code&gt;&lt;/strong&gt;: wraps a single &lt;code&gt;ref int&lt;/code&gt; — the B+Tree node's version field. The entire optimistic lock coupling protocol (read version, validate, try-write-lock) in a struct that's basically a typed pointer. Allocated millions of times per second during tree traversal, at zero GC cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;EpochGuard&lt;/code&gt;&lt;/strong&gt;: &lt;a href="https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization" rel="noopener noreferrer"&gt;RAII&lt;/a&gt; scope for epoch-based page protection. Enter and exit in 3.3 ns. Because it's a &lt;code&gt;ref struct&lt;/code&gt;, it can't be boxed, captured in a closure, or passed to async code — exactly the constraints you want for a scope guard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;WalClaim&lt;/code&gt;&lt;/strong&gt;: a Write-Ahead Log buffer claim containing a &lt;code&gt;Span&amp;lt;byte&amp;gt;&lt;/code&gt; that points directly into native WAL memory. Can't escape to the heap by construction — the Span field makes it a &lt;code&gt;ref struct&lt;/code&gt; automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PointInTimeAccessor&lt;/code&gt;&lt;/strong&gt;: a reusable snapshot attached to parallel workers. One per worker, stored in a flat array indexed by worker ID. Zero per-entity dictionary overhead — no &lt;code&gt;Dictionary&amp;lt;EntityId, T&amp;gt;&lt;/code&gt; on the hot path.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For short-lived buffers, &lt;code&gt;stackalloc&lt;/code&gt; with a threshold pattern: stack-allocate when the array is small (under 64 elements), fall back to the heap otherwise. Most arrays stay small, so they never touch the allocator.&lt;/p&gt;

&lt;p&gt;For larger long-lived buffers, the Pinned Object Heap: &lt;code&gt;GC.AllocateArray&amp;lt;byte&amp;gt;(capacity, pinned: true)&lt;/code&gt;. Pre-zeroed by the OS, never compacted by the GC, stable pointer for direct access. Typhon's HashMap uses this for its entire entry array.&lt;/p&gt;

&lt;p&gt;For medium reusable buffers, &lt;code&gt;ArrayPool&amp;lt;T&amp;gt;.Shared&lt;/code&gt;. FPI compression rents 9 KB buffers, returns them in a &lt;code&gt;finally&lt;/code&gt; block. Query execution rents stream arrays sized for the common case (8 slots), doubles if needed.&lt;/p&gt;

&lt;p&gt;Four strategies — ref struct for scoped access, stackalloc for small temporaries, POH for large long-lived buffers, ArrayPool for medium reusable buffers. The result: zero hot-path allocations in steady state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 3: Reduce Memory Indirections
&lt;/h2&gt;

&lt;p&gt;Every pointer chase is a potential cache miss. An L3 hit costs ~100 cycles, a DRAM miss costs ~200+. The goal: minimize the number of hops from "I want this data" to "here's the data."&lt;/p&gt;

&lt;p&gt;Post #1 showed the flagship example — the &lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;SIMD chunk accessor&lt;/a&gt; with its 3-tier lookup (MRU check, Vector256 search, clock-hand eviction). Each tier reduces indirection compared to the next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Epoch-based page protection&lt;/strong&gt; eliminates another class of indirection. The traditional approach: atomic increment on page access, atomic decrement on release. For N page accesses in a transaction, that's 2N atomic operations — each one a potential cache-line bounce. Typhon uses epoch-based protection instead: one stamp when entering a transaction scope, one clear when exiting. Pages accessed within an active epoch can't be evicted. Cost: 2 operations per transaction, regardless of how many pages are touched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zone maps&lt;/strong&gt; eliminate entire clusters of indirection. Each indexed field maintains per-cluster min/max bounds. A range query like &lt;code&gt;WHERE Level &amp;gt;= 50&lt;/code&gt; checks two integers per cluster — if the cluster's maximum is below 50, skip every entity in it without loading a single component byte. The impact at different selectivities, measured on 100K entities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Selectivity&lt;/th&gt;
&lt;th&gt;Without zone maps&lt;/th&gt;
&lt;th&gt;With zone maps&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;1.3 ms&lt;/td&gt;
&lt;td&gt;10x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;0.65 ms&lt;/td&gt;
&lt;td&gt;21x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;0.16 ms&lt;/td&gt;
&lt;td&gt;84x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1%&lt;/td&gt;
&lt;td&gt;13.4 ms&lt;/td&gt;
&lt;td&gt;0.05 ms&lt;/td&gt;
&lt;td&gt;268x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The float ordering trick makes this work for non-integer types: an IEEE 754 sign-flip converts floats to a representation where integer comparison order equals numeric order, enabling the same two-comparison interval overlap check regardless of field type.&lt;/p&gt;

&lt;p&gt;At the other end of the scale, division elimination saves cycles on every single chunk lookup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Field: precomputed at segment creation&lt;/span&gt;
&lt;span class="c1"&gt;// Replaces expensive division (~20-80 cycles) with multiply+shift (~3-4 cycles)&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;ulong&lt;/span&gt; &lt;span class="n"&gt;_divMagic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Constructor: compute magic multiplier once&lt;/span&gt;
&lt;span class="n"&gt;_divMagic&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0x1_0000_0000U&lt;/span&gt;&lt;span class="n"&gt;L&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;_otherChunkCount&lt;/span&gt; &lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;_otherChunkCount&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Hot path: every chunk lookup uses this instead of idiv&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)((&lt;/span&gt;&lt;span class="n"&gt;adjusted&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;_divMagic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;adjusted&lt;/span&gt; &lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;pageIndex&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;_otherChunkCount&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integer division (&lt;code&gt;idiv&lt;/code&gt; on x64) is notoriously slow — 20 to 80 cycles depending on operand size. The magic multiplier replaces it with a multiply and a shift: 3-4 cycles. The precomputation happens once when a segment is created; the benefit repeats on every one of the millions of chunk lookups that follow. Six lines of math, 20x speedup on a hot path. This is a classic systems programming trick that most managed-language developers have never needed — but when your per-entity budget is 2.5 nanoseconds, you need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 4: Let the JIT Help
&lt;/h2&gt;

&lt;p&gt;The JIT compiler is your optimization partner, not your enemy. Write code in patterns it can optimize, and it does work for you that you'd have to do manually in C or Rust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrained generics&lt;/strong&gt; give you monomorphization. When you write &lt;code&gt;where TMask : struct, IArchetypeMask&amp;lt;TMask&amp;gt;&lt;/code&gt;, the JIT generates a separate native code path for each concrete type. &lt;code&gt;ArchetypeMask256&lt;/code&gt; (four &lt;code&gt;ulong&lt;/code&gt; fields, bitwise operations) gets fully inlined — no vtable, no virtual dispatch. This is the same optimization Rust gets from generics, but opt-in through the &lt;code&gt;struct&lt;/code&gt; constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;sealed&lt;/code&gt;&lt;/strong&gt; enables devirtualization. &lt;code&gt;DirtyBitmap&lt;/code&gt; and &lt;code&gt;ArchetypeClusterInfo&lt;/code&gt; are both on hot paths and both sealed. The JIT knows no subclass can exist, so it converts virtual calls to direct calls and can inline them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;[AggressiveInlining]&lt;/code&gt;&lt;/strong&gt; eliminates call overhead on micro-operations. B+Tree binary search, transaction state validation, every lock acquire/release — the overhead of a method call (save registers, set up stack frame, restore) is 2-5 ns. On a path called millions of times, that compounds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SoA layout enables auto-vectorization.&lt;/strong&gt; When a cluster is fully occupied (all N slots in use), the iteration loop becomes a simple sequential walk over contiguous SoA arrays with no branches. The JIT can auto-vectorize this on AVX2 — processing 8 floats per SIMD instruction. The SoA layout isn't just about cache locality; it's about giving the JIT a pattern it can vectorize.&lt;/p&gt;

&lt;p&gt;But the most surprising JIT trick is dead-code elimination through &lt;code&gt;static readonly&lt;/code&gt; fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// TelemetryConfig.cs — field declarations&lt;/span&gt;
&lt;span class="c1"&gt;/// &amp;lt;summary&amp;gt;&lt;/span&gt;
&lt;span class="c1"&gt;/// static readonly fields allow the JIT to eliminate disabled telemetry code paths&lt;/span&gt;
&lt;span class="c1"&gt;/// entirely. When a readonly field is false, the JIT treats guarded blocks as dead&lt;/span&gt;
&lt;span class="c1"&gt;/// code and removes them completely in Tier 1 compilation.&lt;/span&gt;
&lt;span class="c1"&gt;/// &amp;lt;/summary&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;Enabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;EcsEnabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;EcsActive&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// Combined: Enabled &amp;amp;&amp;amp; EcsEnabled&lt;/span&gt;

&lt;span class="c1"&gt;// Static constructor — computed once at startup&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="nf"&gt;TelemetryConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetSection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Typhon:Telemetry"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;Enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;section&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Enabled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;EcsEnabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ecsSection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Enabled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;EcsActive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Enabled&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;EcsEnabled&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// EcsQuery.cs — usage on hot path&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TelemetryConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EcsActive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;activity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TyphonActivitySource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;StartActivity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ECS.Query.Execute"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;activity&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;SetTag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TyphonSpanAttributes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EcsArchetype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TArchetype&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;EcsActive&lt;/code&gt; is &lt;code&gt;false&lt;/code&gt;, the JIT doesn't just short-circuit the branch — it &lt;strong&gt;eliminates the entire &lt;code&gt;if&lt;/code&gt; block&lt;/strong&gt; from the generated native code. No branch instruction, no condition check, zero cost. The &lt;code&gt;static readonly&lt;/code&gt; field, initialized in a static constructor, is treated as a constant after Tier 1 JIT compilation. The dead branch and everything inside it vanish.&lt;/p&gt;

&lt;p&gt;This gives you zero-cost observability. Full OpenTelemetry tracing when enabled; literally nothing — not even a branch — when disabled. Most C# developers don't know the JIT does this. It's worth structuring your telemetry and feature flags around this pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Principle 5: Design for the Hardware
&lt;/h2&gt;

&lt;p&gt;The CPU manual is a requirements document. Cache-line size, SIMD register width, TLB coverage, memory bandwidth — these aren't abstract numbers. They drive struct sizing, batch sizes, and allocation strategy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache-line size (64 bytes on x86, 128 bytes on Apple Silicon)&lt;/strong&gt; drives &lt;code&gt;CacheLinePaddedInt&lt;/code&gt; sizing, B+Tree node alignment, and SoA array alignment. The ViewDeltaRingBuffer aligns each sub-buffer to 64-byte boundaries so that the hardware prefetcher doesn't waste bandwidth loading adjacent unrelated data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SIMD width&lt;/strong&gt; determines batch sizes. Typhon's &lt;code&gt;SimdPredicateEvaluator&lt;/code&gt; uses three-tier CPU dispatch for filtering entities by field values: AVX-512 processes 16 integer comparisons per instruction, AVX2 processes 8, with a scalar fallback for older hardware. The AVX-512 path uses a workaround — .NET doesn't expose 512-bit gather intrinsics, so it performs two 256-bit AVX2 gathers and combines them into a &lt;code&gt;Vector512&lt;/code&gt; for the comparison step. The JIT emits a native &lt;code&gt;vpcmpd&lt;/code&gt; instruction for the 16-wide comparison. On Zen 4 (which double-pumps 512-bit operations), throughput matches two AVX2 iterations but with half the loop overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software prefetch&lt;/strong&gt; hides memory latency where it matters most. During HashMap resize, speculative prefetch computes the &lt;em&gt;future&lt;/em&gt; entry's position in the resized table and issues &lt;code&gt;Sse.Prefetch0&lt;/code&gt; to start loading that cache line while the current entry is being processed. The JIT translates this to a &lt;code&gt;prefetcht0&lt;/code&gt; instruction — essentially free to issue, and it hides 100+ cycles of latency per entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BMI2 instructions&lt;/strong&gt; accelerate spatial indexing. Morton key encoding (Z-order curves) uses &lt;code&gt;Bmi2.ParallelBitDeposit&lt;/code&gt; to interleave X/Y coordinates in ~1 cycle. The scalar fallback costs ~10 cycles. Morton ordering places spatially adjacent grid cells at nearby array indices, improving cache locality during neighbor queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLB coverage&lt;/strong&gt; constrains working set design. Without 2 MB huge pages, x86 L2 TLB covers only 8-12 MB. Every access beyond that risks a 15-20 ns page walk penalty on top of the data access itself. Typhon's cluster storage keeps 100K entities in ~2.5 MB — comfortably within L2 TLB coverage even without huge pages. For larger datasets, the page cache's 8 KB pages and sequential access patterns keep the hardware prefetcher effective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory bandwidth (~50 GB/s on Zen 4)&lt;/strong&gt; is the ceiling for bulk scans. If your SoA component scan isn't approaching this number, something is leaving performance on the table — unnecessary indirection, poor alignment, or branches that defeat the prefetcher.&lt;/p&gt;

&lt;p&gt;All measurements in this post were taken on an AMD Ryzen 9 7950X with .NET 10, BenchmarkDotNet, release configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Individual principles are nice. What matters is how they compound. Here's what the engine actually delivers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cluster iteration (per entity)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CRUD lifecycle (spawn, read, update, destroy, commit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.95 μs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transaction create-read-commit (100 entities)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.6 μs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+Tree point lookup (10K entries)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;191 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component read (1 MVCC version)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;703 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component read (50 MVCC versions)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;720 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uncontended RW lock acquire&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page cache hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.5 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunk accessor MRU hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.1 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Epoch enter/exit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.3 ns&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cascade delete 10K entities&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.6 μs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The version invariance number deserves a callout: reading a component with 50 MVCC revisions costs the same as reading one with a single revision. 703 ns vs 720 ns — within measurement noise. The revision chain design works.&lt;/p&gt;

&lt;p&gt;These principles also scale to parallel execution:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workers&lt;/th&gt;
&lt;th&gt;Tick time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;Efficiency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;~37 ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;~18 ms&lt;/td&gt;
&lt;td&gt;2.1x&lt;/td&gt;
&lt;td&gt;104%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;~10 ms&lt;/td&gt;
&lt;td&gt;3.8x&lt;/td&gt;
&lt;td&gt;95%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;~5.3 ms&lt;/td&gt;
&lt;td&gt;7.1x&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;89% parallel efficiency on 8 workers. The 16-worker result (6.7x, 42% efficiency) hits the L3 cache / CCD boundary on the 7950X — a hardware wall, not a software one.&lt;/p&gt;

&lt;p&gt;To put these numbers in perspective, here's the concurrency cost hierarchy that drives Typhon's design decisions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0: Thread-local&lt;/td&gt;
&lt;td&gt;~2 ns&lt;/td&gt;
&lt;td&gt;TLS counter, local variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1: Uncontended atomic&lt;/td&gt;
&lt;td&gt;5-10 ns&lt;/td&gt;
&lt;td&gt;AccessControl read latch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2: Contended atomic&lt;/td&gt;
&lt;td&gt;20-140 ns&lt;/td&gt;
&lt;td&gt;Multiple writers, same lock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3: System call&lt;/td&gt;
&lt;td&gt;500-1000 ns&lt;/td&gt;
&lt;td&gt;Timestamp via syscall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4: Context switch&lt;/td&gt;
&lt;td&gt;~10,000 ns&lt;/td&gt;
&lt;td&gt;Blocking lock, futex wait&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5: Oversubscription&lt;/td&gt;
&lt;td&gt;100,000+ ns&lt;/td&gt;
&lt;td&gt;More threads than cores&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each level is roughly 10x more expensive than the previous one. Typhon's &lt;code&gt;AdaptiveWaiter&lt;/code&gt; (spin → yield → sleep progression) keeps most contention at Level 2, avoiding the 100x jump to Level 4. The cache-line padding from Principle 1 keeps parallel workers from bouncing each other between Level 1 and Level 2. Every design decision maps to staying as low in this hierarchy as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Unsafe is unsafe.&lt;/strong&gt; These techniques require &lt;code&gt;unsafe&lt;/code&gt; code — pointer arithmetic, raw memory access, manual layout control. One bug can corrupt the page cache. Roslyn analyzers catch some classes of errors at compile time, but not all. The safety net has holes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity budget.&lt;/strong&gt; Magic multipliers, SIMD evaluators, epoch-based protection, zone maps — each one is simple in isolation. The combination creates a codebase that demands systems-level understanding to navigate. There's no shortcut around understanding the hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not all of this transfers.&lt;/strong&gt; Most .NET applications don't need microsecond latency. Using &lt;code&gt;CacheLinePaddedInt&lt;/code&gt; in a web API is premature optimization. These techniques are for when you've measured, profiled, and confirmed that memory access patterns are your bottleneck — not before.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The next post dives into concurrency: "Deadlock-Free by Construction: How Typhon Eliminates Deadlocks Instead of Detecting Them." Most databases treat deadlocks as a runtime problem — detect the cycle, abort a transaction, retry. Typhon makes deadlocks structurally impossible through a three-pillar mathematical argument. No detection, no timeouts, no retries.&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>performance</category>
      <category>database</category>
    </item>
    <item>
      <title>What Game Engines Know About Data That Databases Forgot</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Sun, 05 Apr 2026 22:20:39 +0000</pubDate>
      <link>https://dev.to/nockawa/what-game-engines-know-about-data-that-databases-forgot-10m2</link>
      <guid>https://dev.to/nockawa/what-game-engines-know-about-data-that-databases-forgot-10m2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;💡Typhon is an embedded, persistent, ACID database engine written in .NET that speaks the native language of game servers and real-time simulations: entities, components, and systems.&lt;br&gt;&lt;br&gt;
It delivers full transactional safety with MVCC snapshot isolation at sub-microsecond latency, powered by cache-line-aware storage, zero-copy access, and configurable durability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series: A Database That Thinks Like a Game Engine&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://nockawa.github.io/blog/why-building-database-engine-in-csharp/" rel="noopener noreferrer"&gt;Why I'm Building a Database Engine in C#&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What Game Engines Know About Data That Databases Forgot&lt;/strong&gt; &lt;em&gt;(this post)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Microsecond Latency in a Managed Language &lt;em&gt;(coming soon)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;Game servers sit at an uncomfortable intersection. They need the raw throughput of a game engine — tens of thousands of entities updated every tick. But they also need what databases provide: transactions that don't corrupt state, queries that don't scan everything, and durability that survives crashes.&lt;/p&gt;

&lt;p&gt;Today, game server teams pick one side and hack around the other. An &lt;a href="https://en.wikipedia.org/wiki/Entity_component_system" rel="noopener noreferrer"&gt;Entity-Component-System&lt;/a&gt; framework for speed, with manual serialization to a database for persistence. Or a database for safety, with an impedance mismatch every time they touch game state.&lt;/p&gt;

&lt;p&gt;Typhon draws from both traditions. It's a database engine that stores data the way game engines do — and provides the guarantees that game servers need. Here's why those two worlds aren't as far apart as they look.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Fields, One Problem
&lt;/h2&gt;

&lt;p&gt;ECS architecture evolved in game engines. Relational databases evolved in enterprise software. They never talked to each other. But look at what they built:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ECS Concept&lt;/th&gt;
&lt;th&gt;Database Concept&lt;/th&gt;
&lt;th&gt;Shared Principle&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Archetype&lt;/td&gt;
&lt;td&gt;Table&lt;/td&gt;
&lt;td&gt;Homogeneous, fixed-schema storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component&lt;/td&gt;
&lt;td&gt;Column&lt;/td&gt;
&lt;td&gt;Typed, blittable, bulk-iterable data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entity&lt;/td&gt;
&lt;td&gt;Row&lt;/td&gt;
&lt;td&gt;Identity with dynamic composition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System&lt;/td&gt;
&lt;td&gt;Query&lt;/td&gt;
&lt;td&gt;Process all records matching a signature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frame Budget (16ms)&lt;/td&gt;
&lt;td&gt;Latency SLA&lt;/td&gt;
&lt;td&gt;Hard real-time deadline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An ECS "archetype" is a table. A "component" is a column. A "system" is a query. The vocabulary is different, the underlying structure is the same. Two fields, separated by decades and industry boundaries, converged on structurally identical solutions because they were solving the same fundamental problem: managing structured data under performance constraints.&lt;/p&gt;

&lt;p&gt;This convergence is why a synthesis is possible at all. It's not an accident — it's driven by the same physics. Data must be laid out for the CPU cache. Access patterns must be predictable. Latency budgets are real.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned From Game Engines
&lt;/h2&gt;

&lt;p&gt;ECS taught the database world something important about how data should be stored. Three lessons Typhon draws directly from game engine architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache locality by default.&lt;/strong&gt; In a traditional row store, reading all player positions means loading entire rows — names, inventories, health, everything. Most of those bytes are wasted. In ECS, components are stored per type: all positions contiguous, all health values contiguous. Reading 10,000 positions is a linear memory scan where every byte is useful.&lt;/p&gt;

&lt;p&gt;This matters more than most developers realize. An L1 cache hit costs roughly 1 nanosecond. A DRAM miss costs 60-70 ns — a &lt;strong&gt;65x penalty&lt;/strong&gt;. When your database layout forces cache misses, no amount of algorithmic cleverness can save you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/typhon-ecs-vs-rowstore.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Ftyphon-ecs-vs-rowstore.png" alt="Storage layout comparison — traditional row store vs Typhon's component store" width="800" height="1135"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero-copy is the default, not the optimization.&lt;/strong&gt; In a traditional database, reading a record means deserializing from a storage page into a language-level object. In ECS, a component is already in memory in its final layout — you just hand back a pointer. Typhon preserves this: components are blittable &lt;code&gt;unmanaged&lt;/code&gt; structs read directly from pinned memory pages. No serialization, no managed heap allocation, no GC involvement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entity as pure identity.&lt;/strong&gt; In ECS, an entity is just an ID — a 64-bit number with no inherent structure. All data lives externally in component tables. This is the opposite of ORM thinking where the object &lt;em&gt;is&lt;/em&gt; the entity. Typhon inherits this: &lt;code&gt;EntityId&lt;/code&gt; is a lightweight value type, all state lives in typed component storage. This separation is what makes the rest of the architecture possible — per-component versioning, per-component storage modes, independent indexes per component type.&lt;/p&gt;
&lt;h2&gt;
  
  
  What We Learned From Databases
&lt;/h2&gt;

&lt;p&gt;Traditional databases solved problems that ECS never had to face. Four capabilities Typhon draws from database architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ACID transactions with per-component MVCC.&lt;/strong&gt; Game engines typically have no isolation. Two systems modifying the same entity in the same tick is a race condition — and in a single-process game, you control the execution order so you can manage it. On a game server with concurrent player sessions, you can't.&lt;/p&gt;

&lt;p&gt;Databases solved this decades ago with MVCC: snapshot isolation where readers never block writers, with conflict detection at commit time. Typhon brings this in — but with a twist. Traditional databases version entire rows. Typhon versions each component independently. An entity's &lt;code&gt;PositionComponent&lt;/code&gt; and &lt;code&gt;InventoryComponent&lt;/code&gt; each maintain their own revision chain: a circular buffer of 12-byte revision entries, each stamped with a 48-bit transaction sequence number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified: finding the visible revision for a snapshot&lt;/span&gt;
&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;WalkRevisions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entityId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsolationFlag&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TSN&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;myTransactionTSN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Skip uncommitted revisions from other transactions&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TSN&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;snapshotTSN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;rev&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Most recent revision visible to our snapshot&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means a transaction reading a player's position sees a consistent frozen point-in-time across &lt;em&gt;all&lt;/em&gt; component types simultaneously — without locking any of them. Writers never block readers. And because revisions are per-component rather than per-entity, updating a player's position doesn't create a new version of their inventory. Less data copied, less garbage to collect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indexed selective access.&lt;/strong&gt; This is the big one. ECS systems iterate &lt;em&gt;everything&lt;/em&gt; matching a component signature every tick. That works brilliantly for particle simulations where every particle needs updating. But game servers often don't need all of them:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Total Entities&lt;/th&gt;
&lt;th&gt;Processed Per Tick&lt;/th&gt;
&lt;th&gt;Useful Work&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Battle royale (per-client relevancy)&lt;/td&gt;
&lt;td&gt;50,000 actors&lt;/td&gt;
&lt;td&gt;500–2,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1–4%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MMO area of interest&lt;/td&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;200–1,000&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.2–1%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physics (awake bodies only)&lt;/td&gt;
&lt;td&gt;All rigidbodies&lt;/td&gt;
&lt;td&gt;Awake subset&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5–20%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you're processing 1–4% of your entities, scanning everything is doing 25–100x more work than necessary. ECS frameworks recognized this — Unity DOTS added enableable components, Flecs added &lt;code&gt;group_by&lt;/code&gt;, Unreal MassEntity added LOD tiers. These are all clever workarounds for the same underlying issue: ECS was designed for bulk iteration, not selective access.&lt;/p&gt;

&lt;p&gt;Databases solved this with indexes. B+Trees for value-based lookups, spatial trees for area-of-interest queries, selectivity estimation to decide when to scan versus when to seek. Typhon brings these into the component storage model — not as bolted-on workarounds, but as first-class citizens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spatial partitioning.&lt;/strong&gt; For spatial access patterns specifically — the #1 selective access need in game servers — Typhon integrates a two-layer spatial index directly into the component storage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1: Sparse hash map&lt;/strong&gt; — maps coarse grid cells to entity counts. O(1) rejection of empty regions before the tree is even touched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2: Page-backed R-Tree&lt;/strong&gt; — AABB, radius, ray, frustum, and kNN queries. Same OLC-latched, SOA node architecture as the B+Trees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both layers run inside the same transactional model as everything else. No external spatial hash bolted on alongside your ECS. No cache locality destroyed by chasing pointers into a separate data structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Durability.&lt;/strong&gt; A game client can afford to lose state on crash — reload the level. A game server cannot. Player inventories, economy state, progression data — all must survive process restarts and crashes. WAL-based crash recovery, checkpointing, configurable fsync — these are database fundamentals that game servers need but ECS frameworks never provided.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query planning.&lt;/strong&gt; When you have both indexes and sequential storage, someone needs to decide which access path to use. Databases have decades of work on cost-based query optimization — selectivity estimation, histogram statistics, index selection. Typhon brings a query planner into the ECS world: given a predicate on a component field, it automatically chooses full scan or B+Tree seek based on estimated selectivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Purpose-Built for Game Servers
&lt;/h2&gt;

&lt;p&gt;Typhon doesn't glue ECS and database concepts together with duct tape. It synthesizes them into a single model designed for game server workloads.&lt;/p&gt;

&lt;p&gt;A component in Typhon is simultaneously an ECS component and a database schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Component&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;PlayerComponent&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="n"&gt;String64&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;                    &lt;span class="c1"&gt;// B+Tree for fast lookups&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;AccountId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="n"&gt;Experience&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Blittable, unmanaged, fixed-size, stored contiguously per type — that's the ECS side. Typed fields with automatic B+Tree indexes on marked fields — that's the database side. One declaration, both worlds.&lt;/p&gt;

&lt;p&gt;The query API makes the synthesis concrete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;topPlayers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Player&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;OrderByDescending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Level&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ExecuteOrdered&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ECS-style typed component access. Database-style predicate filtering with automatic index selection. Inside a transaction with snapshot isolation. The query planner chooses scan vs B+Tree based on selectivity — the developer doesn't have to.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/typhon-query-flow.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Ftyphon-query-flow.png" alt="How a typed query flows through Typhon — from lambda expression to archetype mask filtering, selectivity estimation, and component reads" width="782" height="1348"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And because game servers have different durability needs for different operations, Typhon lets you choose per unit of work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Position ticks: game-engine speed, batched durability&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;uow&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateUnitOfWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DurabilityMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Deferred&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Legendary item drop: database safety, immediate fsync&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;uow&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateUnitOfWork&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DurabilityMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Immediate&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same engine, same API. &lt;code&gt;Deferred&lt;/code&gt; mode gives game-engine-class commit latency for position updates that can be re-simulated on crash. &lt;code&gt;Immediate&lt;/code&gt; mode gives database-class guarantees for a transaction that grants a rare item worth real money. The game server decides per operation — not globally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storage Modes: Not All Data Is Equal
&lt;/h3&gt;

&lt;p&gt;A game server doesn't treat all data the same. Player positions change 60 times per second and can be re-simulated on crash. Inventory mutations are rare but must never be lost. AI runtime state — current targets, threat scores, pathfinding waypoints — is recomputed every tick and worthless after a restart.&lt;/p&gt;

&lt;p&gt;Traditional databases treat all data identically. Traditional ECS keeps everything in memory with no durability distinction. Typhon lets you choose per component type:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;MVCC History&lt;/th&gt;
&lt;th&gt;Persisted&lt;/th&gt;
&lt;th&gt;Change Tracking&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Versioned&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full revision chains&lt;/td&gt;
&lt;td&gt;Yes (WAL + checkpoint)&lt;/td&gt;
&lt;td&gt;Via MVCC&lt;/td&gt;
&lt;td&gt;Inventory, economy, progression&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SingleVersion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current state only&lt;/td&gt;
&lt;td&gt;Yes (WAL + checkpoint)&lt;/td&gt;
&lt;td&gt;DirtyBitmap&lt;/td&gt;
&lt;td&gt;Positions, health, frequently-updated state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transient&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current state only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;DirtyBitmap&lt;/td&gt;
&lt;td&gt;AI blackboard, threat scores, pathfinding scratch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SingleVersion components skip the revision chain overhead entirely — no circular buffer, no per-write allocation. They track changes through a DirtyBitmap instead: one bit per entity, flipped on write, scanned on tick fence. This is how game engines track what changed, and it's the right model for data that updates every tick.&lt;/p&gt;

&lt;p&gt;Versioned components get full MVCC with snapshot isolation — readers see consistent historical state, writers don't block readers, conflicts are detected at commit time. This is how databases protect critical data, and it's the right model for things that must never be corrupted.&lt;/p&gt;

&lt;p&gt;Transient components never touch disk at all — no WAL, no checkpoint, no recovery. Pure in-memory storage with the same query and indexing API as everything else. AI blackboard data that's recomputed every tick has no business paying persistence overhead.&lt;/p&gt;

&lt;p&gt;The same engine, the same transaction API, but the storage layer does exactly what each component type needs. This is what "purpose-built for game servers" means in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Views: The Bridge Between ECS Systems and Database Queries
&lt;/h3&gt;

&lt;p&gt;In ECS, a "system" runs every tick, processing all matching entities. In a database, a "materialized view" maintains a cached result set and refreshes it incrementally. Typhon's Views are both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ItemData&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Rarity&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ToView&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Game loop&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;running&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tx&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dbe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;CreateQuickTransaction&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Refresh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Microsecond incremental refresh&lt;/span&gt;

    &lt;span class="c1"&gt;// React to changes — like an ECS system, but only for what changed&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetDelta&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Added&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="nf"&gt;SpawnVisual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;DespawnVisual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pk&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Modified&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;UpdateVisual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ClearDelta&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The initial &lt;code&gt;ToView()&lt;/code&gt; runs a full query. After that, &lt;code&gt;Refresh()&lt;/code&gt; drains a lock-free ring buffer of changes pushed by the commit path — only entities whose indexed fields actually changed are re-evaluated. If 100,000 entities match your view but only 12 changed since last refresh, you do 12 evaluations, not 100,000.&lt;/p&gt;

&lt;p&gt;This is the iterate-everything problem solved from the database side: don't re-scan, track deltas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;Specializing for game servers means giving things up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blittable components only.&lt;/strong&gt; No &lt;code&gt;string&lt;/code&gt;, no object references, no variable-length arrays inside components. Text uses fixed-size types like &lt;code&gt;String64&lt;/code&gt;. This is the price of zero-copy reads and cache-friendly storage — and it's a constraint game developers are already familiar with from ECS frameworks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entity-centric relationships, not SQL JOINs.&lt;/strong&gt; Typhon supports navigation links, 1:N and N:M relationships — but they follow entity references, closer to a graph database than a traditional SQL one. This matches how game servers naturally think about data (an entity &lt;em&gt;has&lt;/em&gt; components, a guild &lt;em&gt;contains&lt;/em&gt; members), but if your mental model is &lt;code&gt;SELECT ... FROM a JOIN b ON a.x = b.y&lt;/code&gt;, it's a different paradigm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema in code, not SQL.&lt;/strong&gt; Components are C# structs with attributes, not DDL statements. Natural for game developers, unfamiliar territory for database administrators. If your team thinks in SQL, this is a paradigm shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll go deeper into the performance philosophy that makes all of this actually fast — data-oriented design, cache-line awareness, and zero-allocation hot paths. The principles that let a managed language hit microsecond-latency transactions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you want to follow along, the best way is to star &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;the repo&lt;/a&gt; or subscribe to the &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS feed&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>database</category>
      <category>ecs</category>
      <category>gamedev</category>
    </item>
    <item>
      <title>Why I'm Building a Database Engine in C#</title>
      <dc:creator>Loïc Baumann</dc:creator>
      <pubDate>Sun, 29 Mar 2026 13:02:55 +0000</pubDate>
      <link>https://dev.to/nockawa/why-im-building-a-database-engine-in-c-1np0</link>
      <guid>https://dev.to/nockawa/why-im-building-a-database-engine-in-c-1np0</guid>
      <description>&lt;p&gt;When I tell people I'm building an ACID database engine in C#, the first reaction is always the same: &lt;em&gt;"But what about GC pauses?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's a fair question. Nobody builds high-performance database engines in .NET. The assumption is that you need C, C++, or Rust for this class of software — that managed languages are fundamentally disqualified from the microsecond-latency club.&lt;/p&gt;

&lt;p&gt;After 30 years of building real-time 3D engines and systems software, I chose C# anyway. The project is called &lt;strong&gt;Typhon&lt;/strong&gt;: an embedded ACID database engine targeting 1–2 microsecond transaction commits. And the reasons behind that choice might change how you think about what C# can do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Case Against C# (Let's Steel-Man It)
&lt;/h2&gt;

&lt;p&gt;Before I make my case, let me honestly lay out every argument against choosing C# for this. These are real concerns, not strawmen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GC is non-deterministic.&lt;/strong&gt; It can pause all your threads whenever it wants. For a database engine that promises microsecond latency, a 10ms Gen2 collection is catastrophic — that's 10,000x your latency budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You don't control memory layout.&lt;/strong&gt; The managed heap decides where objects live. The GC can move them around during compaction. You can't guarantee that your B+Tree nodes sit on cache-line boundaries, or that your page cache buffer won't get relocated mid-transaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JIT warmup is real.&lt;/strong&gt; The first call to any method pays the compilation cost. In a database engine, the first transaction after startup shouldn't be 100x slower than the steady state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Virtual dispatch and bounds checking add overhead.&lt;/strong&gt; Every array access has a hidden bounds check. Every interface call goes through a vtable. In a hot loop processing millions of entities, these nanoseconds compound.&lt;/p&gt;

&lt;p&gt;These are all legitimate problems. I won't pretend they aren't. But here's what most people miss: &lt;strong&gt;modern C# has answers for every single one of them.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Most People Don't Know About C
&lt;/h2&gt;

&lt;p&gt;The C# that most developers know — classes, garbage collection, LINQ — is only half the language. There's a whole other side that the .NET runtime team has been quietly building for a decade, and it looks nothing like what you'd expect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;unsafe&lt;/code&gt; gives you C-level control.&lt;/strong&gt; Raw pointers, pointer arithmetic, &lt;code&gt;stackalloc&lt;/code&gt; for stack buffers, &lt;code&gt;fixed&lt;/code&gt;-size arrays — the JIT generates the same &lt;code&gt;mov&lt;/code&gt;/&lt;code&gt;cmp&lt;/code&gt;/&lt;code&gt;jne&lt;/code&gt; instructions you'd get from C. Not "close to C." The same instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;GCHandle.Alloc(Pinned)&lt;/code&gt; makes the GC irrelevant where it matters.&lt;/strong&gt; You can pin byte arrays so the GC never moves them. Typhon's entire page cache is pinned memory — the GC doesn't touch it, doesn't scan it, doesn't move it. It's just raw bytes at a fixed address, exactly like &lt;code&gt;malloc&lt;/code&gt; in C.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/ref-struct" rel="noopener noreferrer"&gt;&lt;code&gt;ref struct&lt;/code&gt;&lt;/a&gt; eliminates heap allocations on hot paths.&lt;/strong&gt; A &lt;code&gt;ref struct&lt;/code&gt; can never escape to the heap. It lives on the stack, dies when the scope ends, and the GC never knows it existed. Typhon's entity accessor (&lt;code&gt;EntityRef&lt;/code&gt;) is a 96-byte &lt;code&gt;ref struct&lt;/code&gt; — zero allocation, zero GC pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrained generics give you true monomorphization.&lt;/strong&gt; When you write &lt;code&gt;where T : unmanaged&lt;/code&gt;, the JIT generates a separate native code path for each type parameter. &lt;code&gt;sizeof(T)&lt;/code&gt; becomes a constant. Dead branches get eliminated. It's the same optimization Rust gets from generics — not a runtime dispatch, but compile-time specialization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware intrinsics are first-class.&lt;/strong&gt; &lt;a href="https://learn.microsoft.com/en-us/dotnet/api/system.runtime.intrinsics" rel="noopener noreferrer"&gt;&lt;code&gt;System.Runtime.Intrinsics&lt;/code&gt;&lt;/a&gt; gives you &lt;code&gt;Vector256&lt;/code&gt;, &lt;code&gt;Sse42.Crc32&lt;/code&gt;, &lt;code&gt;BitOperations.TrailingZeroCount&lt;/code&gt; — the same SIMD instructions available in C/C++, with the same performance, and runtime feature detection so you can fall back gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;[StructLayout(Explicit)]&lt;/code&gt; gives you exact memory layout.&lt;/strong&gt; Field offsets, padding, size — you control every byte. Cache-line alignment, false-sharing prevention, bit-packing — it's all there.&lt;/p&gt;

&lt;p&gt;This isn't "C# trying to be C." It's C# providing a genuine systems programming layer on top of a best-in-class managed ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Typhon Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://nockawa.github.io/assets/posts/typhon-blog-architecture.svg" rel="noopener noreferrer"&gt;&lt;br&gt;
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fnockawa.github.io%2Fassets%2Fposts%2Ftyphon-blog-architecture.png" alt="Typhon Engine architecture — five layers from API to Concurrency, with components discussed in this post highlighted with ★" width="800" height="1418"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Theory is nice, now let's look at real code.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hardware-accelerated WAL checksums
&lt;/h3&gt;

&lt;p&gt;Every page written to the Write-Ahead Log needs a CRC32C checksum. Here's what that looks like in C# — calling CPU instructions by name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="nf"&gt;ComputePartial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReadOnlySpan&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeSse42X64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeSse42X32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ArmCrc32&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Arm64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IsSupported&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeArm64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;ComputeSoftware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="nf"&gt;ComputeSse42X64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ReadOnlySpan&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;ulong&lt;/span&gt; &lt;span class="n"&gt;crc64&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="kt"&gt;byte&lt;/span&gt; &lt;span class="n"&gt;ptr&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;MemoryMarshal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetReference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;aligned&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;~&lt;/span&gt;&lt;span class="m"&gt;7&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;aligned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;crc64&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;X64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Crc32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadUnaligned&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;ulong&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
        &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;+=&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;crc32&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;crc64&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;crc32&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Sse42&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Crc32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crc32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ptr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;++;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;crc32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Sse42.X64.Crc32()&lt;/code&gt; compiles to a single x86 &lt;code&gt;crc32&lt;/code&gt; instruction. The runtime detects the CPU capabilities, the JIT eliminates the dead branches, and what executes is the same code a C programmer would write — but with automatic fallback on platforms without SSE4.2. Result: &lt;strong&gt;~1.3 µs per 8 KB page&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The SIMD chunk accessor
&lt;/h3&gt;

&lt;p&gt;This is Typhon's page cache hot path — a 16-slot cache that finds your data in one of three tiers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// === ULTRA FAST PATH: MRU check ===&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;mru&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_mruSlot&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_pageIndices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mru&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;headerOffset&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="n"&gt;_rootHeaderOffset&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_otherHeaderOffset&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;*)&lt;/span&gt;&lt;span class="n"&gt;_baseAddresses&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mru&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;headerOffset&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;_stride&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// === FAST PATH: SIMD search through all 16 cached slots ===&lt;/span&gt;
&lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;indices&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_pageIndices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;v0&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;mask0&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ExtractMostSignificantBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask0&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BitOperations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrailingZeroCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;GetFromSlot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirty&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;mask1&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Vector256&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Equals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;ExtractMostSignificantBits&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask1&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="n"&gt;BitOperations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;TrailingZeroCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mask1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;GetFromSlot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pageIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirty&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;_pageIndices&lt;/code&gt; array is a &lt;code&gt;fixed int[16]&lt;/code&gt; — 64 bytes, one cache line, packed for SIMD. One &lt;code&gt;Vector256.Equals&lt;/code&gt; compares 8 page indices in a single instruction. The MRU fast path handles the common case (repeated access to the same page) with a single branch — branch predictor friendly, near-zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-copy entity reads
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;EntityRef&lt;/code&gt; is a &lt;code&gt;ref struct&lt;/code&gt; — stack-only, 96 bytes, with an inline fixed array caching component locations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;unsafe&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;EntityRef&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;EntityId&lt;/span&gt; &lt;span class="n"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ArchetypeMetadata&lt;/span&gt; &lt;span class="n"&gt;_archetype&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;ArchetypeEngineState&lt;/span&gt; &lt;span class="n"&gt;_engineState&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;Transaction&lt;/span&gt; &lt;span class="n"&gt;_tx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="kt"&gt;ushort&lt;/span&gt; &lt;span class="n"&gt;_enabledBits&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;_writable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;fixed&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;_locations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="m"&gt;16&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;  &lt;span class="c1"&gt;// inline component chunk IDs&lt;/span&gt;

    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MethodImpl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MethodImplOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AggressiveInlining&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="n"&gt;Read&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;Comp&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;comp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;unmanaged&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;byte&lt;/span&gt; &lt;span class="n"&gt;slot&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_archetype&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;GetSlot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_componentTypeId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;chunkId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_locations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_engineState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SlotToComponentTable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;_tx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadEcsComponentData&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunkId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;Read&amp;lt;T&amp;gt;&lt;/code&gt; call goes from method call → slot lookup → chunk ID → page cache → pointer arithmetic → &lt;code&gt;ref readonly T&lt;/code&gt; pointing directly into a pinned memory page. Zero copies. Zero allocations. Zero GC involvement. The &lt;code&gt;where T : unmanaged&lt;/code&gt; constraint means the JIT knows the exact layout — it compiles to pointer arithmetic, nothing more.&lt;/p&gt;

&lt;h3&gt;
  
  
  JIT-specialized hash functions
&lt;/h3&gt;

&lt;p&gt;Even the hash functions exploit the JIT. Since &lt;code&gt;sizeof(TKey)&lt;/code&gt; is a compile-time constant for constrained generics, the dead branches vanish:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;MethodImpl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MethodImplOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AggressiveInlining&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="k"&gt;internal&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt; &lt;span class="n"&gt;ComputeHash&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;TKey&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;unmanaged&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;FastHash32&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;As&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;XxHash32_8Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;As&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;XxHash32_Bytes&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;*)&lt;/span&gt;&lt;span class="n"&gt;Unsafe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;AsPointer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TKey&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you call &lt;code&gt;ComputeHash&amp;lt;int&amp;gt;(42)&lt;/code&gt;, the JIT generates &lt;em&gt;just&lt;/em&gt; the 4-byte path. The other two branches are completely eliminated. This is real monomorphization, not runtime dispatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Productivity Argument
&lt;/h2&gt;

&lt;p&gt;A database engine is more than its hot path. Around the core engine sits a large shell of infrastructure: configuration management, structured logging, telemetry, dependency injection, testing, benchmarking.&lt;/p&gt;

&lt;p&gt;In C or Rust, you'd build much of this yourself or stitch together crates/libraries with varying quality. In .NET, this is production-grade and free: &lt;code&gt;ILogger&lt;/code&gt; and &lt;a href="https://opentelemetry.io/docs/languages/net/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; for observability, &lt;a href="https://github.com/dotnet/BenchmarkDotNet" rel="noopener noreferrer"&gt;BenchmarkDotNet&lt;/a&gt; for rigorous micro-benchmarks, NUnit for testing, &lt;code&gt;IConfiguration&lt;/code&gt; for settings. All well-documented, all interoperable, all maintained by Microsoft or battle-tested OSS communities.&lt;/p&gt;

&lt;p&gt;For a solo developer building a database engine, this is a genuine competitive advantage. I spend my time on concurrency primitives and page cache eviction, not on reinventing a logging framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's the Memory Layout, Not the Language
&lt;/h2&gt;

&lt;p&gt;Here's the insight that years of real-time 3D engines taught me: &lt;strong&gt;the bottleneck in a database engine is memory access patterns, not instruction throughput.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A cache miss to DRAM on a Ryzen 7950X costs 61–73 nanoseconds. That's ~250 CPU cycles doing &lt;em&gt;nothing&lt;/em&gt;, waiting for data. A CAS operation hitting L1 costs 1.4 nanoseconds. The ratio is &lt;strong&gt;50:1&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;No amount of "zero-cost abstractions" in your language can save you if your data structures cause cache misses. Conversely, if your data layout is cache-friendly — contiguous, aligned, predictable access patterns — the language barely matters. C# with &lt;code&gt;unsafe&lt;/code&gt; generates identical machine code to C on hot paths. The JIT is that good.&lt;/p&gt;

&lt;p&gt;What matters is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cache-line awareness&lt;/strong&gt;: Typhon's B+Tree nodes are 128 bytes — two cache lines. The stride prefetcher on Zen4 covers the second line automatically. This alone cut insert latency by &lt;strong&gt;53%&lt;/strong&gt; and lookup latency by &lt;strong&gt;30%&lt;/strong&gt; versus 64-byte nodes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data-oriented design&lt;/strong&gt;: Structure of Arrays over Array of Structures. SIMD-friendly layouts. Blittable types only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimizing indirections&lt;/strong&gt;: Every pointer chase is a potential cache miss. The SIMD chunk accessor's MRU hit avoids the chase entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The language you write in matters far less than the memory layout you design.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;All measurements on a Ryzen 9 7950X, .NET 10.0, BenchmarkDotNet, release configuration.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CRUD lifecycle MVCC (spawn, read, update, destroy, commit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.2 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;830K ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90 reads/10 updates workload (100 ops per tx, MVCC)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22 µs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~4.5M entity-ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+Tree lookup (hit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;267 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.7M ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B+Tree sequential scan (per key)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.1 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;479M keys/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Uncontended lock acquire&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.8 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128M ops/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Page cache hit&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.3 ns&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Context: an uncontended CAS on Zen4 costs 1.4 ns. A DRAM round-trip costs 61–73 ns. Typhon's lock acquire (7.8 ns) is about 5 CAS operations — tight, considering it handles shared/exclusive arbitration with waiter tracking. The 267 ns B+Tree lookup implies 6–7 memory accesses, which matches a tree traversal through L2/L3 cache.&lt;/p&gt;

&lt;p&gt;These are early alpha numbers. There's room to improve. But they validate the core thesis: &lt;strong&gt;C# is not the bottleneck.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs
&lt;/h2&gt;

&lt;p&gt;No choice is without cost. Here's what I'd tell someone considering the same path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory safety is on you.&lt;/strong&gt; In &lt;code&gt;unsafe&lt;/code&gt; blocks, you can corrupt memory, dereference bad pointers, overflow buffers — the compiler won't save you. &lt;a href="https://learn.microsoft.com/en-us/dotnet/api/system.span-1" rel="noopener noreferrer"&gt;&lt;code&gt;Span&amp;lt;T&amp;gt;&lt;/code&gt;&lt;/a&gt; is a slightly slower but totally safe alternative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The GC hasn't been a problem — but it could be.&lt;/strong&gt; By pinning the page cache and using &lt;code&gt;ref struct&lt;/code&gt; on hot paths, Gen2 collections are rare and cheap. But I won't pretend this is guaranteed. A workload that allocates heavily in managed code between transactions could still see pauses. The answer is discipline: &lt;strong&gt;don't allocate on hot paths&lt;/strong&gt;. The language lets you — it just doesn't force you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"But Rust would give you compile-time safety."&lt;/strong&gt; True — the borrow checker catches ownership and lifetime bugs that &lt;code&gt;unsafe&lt;/code&gt; C# can't. But C# has a trick Rust doesn't: &lt;strong&gt;&lt;a href="https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/tutorials/how-to-write-csharp-analyzer-code-fix" rel="noopener noreferrer"&gt;Roslyn analyzers&lt;/a&gt;&lt;/strong&gt;. I wrote a custom analyzer suite (TYPHON001–007) that enforces domain-specific safety rules as compiler errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[NoCopy]&lt;/code&gt; attribute + analyzer: performance-critical structs like &lt;code&gt;ChunkAccessor&lt;/code&gt; &lt;strong&gt;cannot be passed by value&lt;/strong&gt; — the compiler errors if you forget &lt;code&gt;ref&lt;/code&gt;. This is the same guarantee Rust's borrow checker gives for move semantics, but scoped to the types that actually matter.&lt;/li&gt;
&lt;li&gt;Ownership tracking: if you create a &lt;code&gt;ChunkAccessor&lt;/code&gt; or &lt;code&gt;Transaction&lt;/code&gt; and don't dispose it, that's a &lt;strong&gt;compiler error&lt;/strong&gt; — not a runtime leak. The analyzer tracks ownership transfers through assignments, returns, and &lt;code&gt;ref&lt;/code&gt;/&lt;code&gt;out&lt;/code&gt; parameters, &lt;code&gt;[return: TransfersOwnership]&lt;/code&gt; on a method helps to express ownership transfer for the analyzer to act accordingly.&lt;/li&gt;
&lt;li&gt;Disposal completeness: if your type holds a critical disposable field and your &lt;code&gt;Dispose()&lt;/code&gt; method misses it or has an early return that skips it — compiler error.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is a compile-time error in Typhon — TYPHON001&lt;/span&gt;
&lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChunkAccessor&lt;/span&gt; &lt;span class="n"&gt;accessor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// ✗ Error: must be passed by ref&lt;/span&gt;

&lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ref&lt;/span&gt; &lt;span class="n"&gt;ChunkAccessor&lt;/span&gt; &lt;span class="n"&gt;accessor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// ✓ OK&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't get Rust's safety for free in C#. But you can &lt;strong&gt;build the exact subset you need&lt;/strong&gt; as compiler errors, tailored to your domain. And unlike Rust's borrow checker, these rules carry domain context in the diagnostics: "causes page cache deadlock" is more actionable than "value moved here."&lt;/p&gt;

&lt;p&gt;Rust's ecosystem for the surrounding infrastructure (logging, DI, configuration, testing) is also less mature than .NET's, and as a solo developer, my velocity matters. I chose the language where I ship faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JIT warmup is real but manageable.&lt;/strong&gt; The first few transactions after cold start are slower. For an embedded engine (no separate server process), this is acceptable — the host application typically has its own warmup. For a server database, you'd want tiered compilation or AOT.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll explain why an ACID database engine borrows its storage architecture from game engines — specifically the Entity-Component-System pattern. Game engines and databases are solving the same fundamental problem: managing structured data with extreme performance constraints. They just evolved completely different solutions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you want to follow along, the best way is to star &lt;a href="https://github.com/nockawa/Typhon" rel="noopener noreferrer"&gt;the repo&lt;/a&gt; or subscribe to the &lt;a href="https://nockawa.github.io/feed.xml" rel="noopener noreferrer"&gt;RSS feed&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Post #1 in a series about building a database engine in C#. Next up: "What Game Engines Know About Data That Databases Forgot".&lt;/em&gt;&lt;/p&gt;

</description>
      <category>csharp</category>
      <category>dotnet</category>
      <category>database</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
