<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kush V</title>
    <description>The latest articles on DEV Community by Kush V (@kusoroadeolu).</description>
    <link>https://dev.to/kusoroadeolu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3795697%2F68bc5722-fd21-4eb5-96ad-18a39f1620bf.png</url>
      <title>DEV Community: Kush V</title>
      <link>https://dev.to/kusoroadeolu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kusoroadeolu"/>
    <language>en</language>
    <item>
      <title>A Practical Exploration of Transactional Map Implementations in Java</title>
      <dc:creator>Kush V</dc:creator>
      <pubDate>Mon, 16 Mar 2026 18:43:58 +0000</pubDate>
      <link>https://dev.to/kusoroadeolu/a-practical-exploration-of-transactional-map-implementations-in-java-3mjb</link>
      <guid>https://dev.to/kusoroadeolu/a-practical-exploration-of-transactional-map-implementations-in-java-3mjb</guid>
      <description>&lt;p&gt;This writeup documents the transactional maps I’ve implemented over the last few weeks. It mainly focuses on design, lifecycle, and implementation details. No benchmarks or perf numbers included.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; The original writeup can be found &lt;a href="https://github.com/kusoroadeolu/articles/blob/main/transactional-maps/implementations.md" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Optimistic Transactional Map
&lt;/h2&gt;

&lt;p&gt;This is mainly inspired by the approach described in &lt;a href="https://people.csail.mit.edu/mcarbin/papers/ppopp07.pdf" rel="noopener noreferrer"&gt;the transactional collections classes paper&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Isolation Level
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;READ COMMITTED&lt;/strong&gt; globally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SERIALIZABLE per-key&lt;/strong&gt; when writes are involved&lt;/li&gt;
&lt;li&gt;No dirty reads, no phantom reads per key&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentMap&amp;lt;K, V&amp;gt;&lt;/code&gt; -&amp;gt; The underlying map&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;KeyToLockers&amp;lt;K&amp;gt;&lt;/code&gt; -&amp;gt; Maps each key to operation-specific &lt;code&gt;GuardedTxSet&lt;/code&gt;s&lt;/li&gt;
&lt;li&gt;Each &lt;code&gt;GuardedTxSet&lt;/code&gt; contains:

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;ReentrantReadWriteLock&lt;/code&gt; (split into read/write locks)&lt;/li&gt;
&lt;li&gt;A set of transactions waiting on that operation&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transaction Lifecycle
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Schedule Phase (when operations are called)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;// Acquires READ lock immediately&lt;/span&gt;
&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Does NOT acquire any lock yet&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For reads (GET/CONTAINS/SIZE):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Immediately acquire the &lt;strong&gt;read lock&lt;/strong&gt; for that operation type on that key&lt;/li&gt;
&lt;li&gt;Add transaction to the &lt;code&gt;GuardedTxSet&lt;/code&gt; for that key + operation&lt;/li&gt;
&lt;li&gt;Multiple readers can proceed concurrently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For writes (PUT/REMOVE):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No locks acquired yet (lazy intent declaration)&lt;/li&gt;
&lt;li&gt;Just record the operation&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Validation Phase
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;For reads:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Already validated (holding read lock since schedule time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For writes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sort all write keys by &lt;code&gt;identityHashCode()&lt;/code&gt; (prevents deadlock via normal ordering)&lt;/li&gt;
&lt;li&gt;For each key being written:

&lt;ol&gt;
&lt;li&gt;Acquire the &lt;strong&gt;write lock&lt;/strong&gt; for that key&lt;/li&gt;
&lt;li&gt;For each conflicting read operation type (GET, CONTAINS):

&lt;ul&gt;
&lt;li&gt;Check if this transaction holds the &lt;strong&gt;read lock&lt;/strong&gt; for that operation&lt;/li&gt;
&lt;li&gt;If yes, &lt;strong&gt;release the read lock first&lt;/strong&gt; (prevent upgrade deadlock)&lt;/li&gt;
&lt;li&gt;Acquire the &lt;strong&gt;write lock&lt;/strong&gt; for that operation type&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Check if SIZE lock is needed:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PUT&lt;/code&gt; on new key -&amp;gt; acquire SIZE write lock (size will increase)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PUT&lt;/code&gt; on existing key -&amp;gt; release CONTAINS write lock if held (size unchanged, CONTAINS always true)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;REMOVE&lt;/code&gt; on existing key -&amp;gt; acquire SIZE write lock (size will decrease)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;REMOVE&lt;/code&gt; on non-existent key -&amp;gt; release CONTAINS write lock if held (size unchanged, CONTAINS always false)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Commit Phase
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Apply all operations to the underlying map&lt;/li&gt;
&lt;li&gt;Release all held locks (both read and write)&lt;/li&gt;
&lt;li&gt;Remove transactions from all &lt;code&gt;GuardedTxSet&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Unique Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Readers block writers, writers block readers, but readers never block readers&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The lock upgrade implementation (release read -&amp;gt; acquire write) prevents self-deadlock&lt;/li&gt;
&lt;li&gt;Semantic awareness: doesn't acquire &lt;strong&gt;SIZE&lt;/strong&gt; lock when not semantically necessary. Acquire &lt;strong&gt;CONTAINS&lt;/strong&gt; lock, only holds on to it if in the case of &lt;strong&gt;REMOVE&lt;/strong&gt;(the value exists), and, in the case of &lt;strong&gt;PUT&lt;/strong&gt; (the values does not exist)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Pessimistic Transactional Map
&lt;/h2&gt;

&lt;p&gt;This is mainly inspired by the approach described in &lt;a href="https://people.csail.mit.edu/mcarbin/papers/ppopp07.pdf" rel="noopener noreferrer"&gt;the transactional collections classes paper&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Isolation Level
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;READ COMMITTED&lt;/strong&gt; globally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SERIALIZABLE per key&lt;/strong&gt; when writes are involved&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentMap&amp;lt;K, V&amp;gt;&lt;/code&gt; -&amp;gt; The underlying map&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;KeyToLockers&amp;lt;K&amp;gt;&lt;/code&gt; -&amp;gt; Maps each key to operation-specific &lt;code&gt;GuardedTxSet&lt;/code&gt;s&lt;/li&gt;
&lt;li&gt;Each &lt;code&gt;GuardedTxSet&lt;/code&gt; contains:

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;ReentrantLock&lt;/code&gt; (write lock)&lt;/li&gt;
&lt;li&gt;An &lt;code&gt;AtomicInteger&lt;/code&gt; reader count&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;Latch&lt;/code&gt; with status (FREE/HELD), parent transaction, and &lt;code&gt;CountDownLatch&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transaction Lifecycle
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Schedule Phase
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;// Does NOT acquire lock, just registers&lt;/span&gt;
&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Does NOT acquire lock, just records&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For reads:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add transaction to the appropriate &lt;code&gt;GuardedTxSet&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;No locks acquired (lazy intent declaration)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For writes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Just record the operation&lt;/li&gt;
&lt;li&gt;No locks acquired&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Validation Phase
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;For writes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sort all write keys by &lt;code&gt;identityHashCode()&lt;/code&gt; for deadlock prevention&lt;/li&gt;
&lt;li&gt;For each write key:

&lt;ol&gt;
&lt;li&gt;Acquire write locks for GET and CONTAINS operation types on that key&lt;/li&gt;
&lt;li&gt;Check if key exists in underlying map&lt;/li&gt;
&lt;li&gt;Determine if SIZE lock needed (same logic as optimistic):

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PUT&lt;/code&gt; on new key -&amp;gt; acquire SIZE lock&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;REMOVE&lt;/code&gt; on existing key -&amp;gt; acquire SIZE lock&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Set latch status to HELD with this transaction as parent&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Drain existing readers&lt;/strong&gt;: spin-wait while &lt;code&gt;readerCount &amp;gt; 0&lt;/code&gt;
&lt;/li&gt;

&lt;/ol&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For reads:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check the latch for the operation type on that key&lt;/li&gt;
&lt;li&gt;If latch is HELD by another transaction:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Waits&lt;/strong&gt; on the &lt;code&gt;CountDownLatch&lt;/code&gt; (blocks until writer releases)&lt;/li&gt;
&lt;li&gt;When awakened, recheck if the latch has been reacquired else increment &lt;code&gt;readerCount&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;If latch is FREE:

&lt;ul&gt;
&lt;li&gt;Just increment &lt;code&gt;readerCount&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Commit Phase
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Apply all operations&lt;/li&gt;
&lt;li&gt;For writers: set latch to FREE, countdown the latch (wake waiting readers)&lt;/li&gt;
&lt;li&gt;Release all locks&lt;/li&gt;
&lt;li&gt;For readers: decrement &lt;code&gt;readerCount&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Remove from &lt;code&gt;GuardedTxSet&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Unique Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Readers use atomic counters, not locks cheaper than lock acquisition&lt;/li&gt;
&lt;li&gt;Writers must &lt;strong&gt;drain readers&lt;/strong&gt; before proceeding (spin on &lt;code&gt;readerCount&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;The latch mechanism prevents TOCTOU bugs: check status and await in single atomic read&lt;/li&gt;
&lt;li&gt;More OS context switches due to readers parking/unparking on &lt;code&gt;CountDownLatch&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Read Uncommitted Transactional Map
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Isolation Level
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;READ UNCOMMITTED&lt;/strong&gt; -&amp;gt; dirty reads allowed (can see uncommitted writes)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentMap&amp;lt;K, V&amp;gt;&lt;/code&gt; -&amp;gt; The underlying map&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LockHolder&amp;lt;K, V&amp;gt;&lt;/code&gt; -&amp;gt; Per-key &lt;code&gt;ReentrantLock&lt;/code&gt;s for writes&lt;/li&gt;
&lt;li&gt;Per-transaction &lt;strong&gt;store buffer&lt;/strong&gt; (&lt;code&gt;Map&amp;lt;K, V&amp;gt;&lt;/code&gt;) -&amp;gt; local cache of writes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transaction Lifecycle
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Schedule Phase
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;All operations just record intent&lt;/li&gt;
&lt;li&gt;No locks acquired for reads OR writes&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Validation Phase
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;For writes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sort write keys by &lt;code&gt;identityHashCode()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Acquire per-key locks in sorted order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For reads:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No validation is needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Commit Phase
&lt;/h4&gt;

&lt;p&gt;Before committing any operations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Snapshot current map size&lt;/span&gt;
&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;txMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Pre-populate store buffer with current values for each operation with a key&lt;/span&gt;
&lt;span class="n"&gt;storeBuf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;put&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;txMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then for each operation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For writes (PUT/REMOVE):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply to underlying map immediately&lt;/li&gt;
&lt;li&gt;Update store buffer with new value&lt;/li&gt;
&lt;li&gt;Track size delta:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PUT&lt;/code&gt; on new key: &lt;code&gt;delta++&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;REMOVE&lt;/code&gt; on existing key: &lt;code&gt;delta--&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For reads (GET):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First check &lt;strong&gt;store buffer&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If not found, perform &lt;strong&gt;dirty read&lt;/strong&gt; from underlying map&lt;/li&gt;
&lt;li&gt;Return result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For CONTAINS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check store buffer only (already populated)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For SIZE:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Return &lt;code&gt;snapshotSize + delta&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Unique Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No reader blocking at all&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dirty reads&lt;/strong&gt; meaning you might see uncommitted data&lt;/li&gt;
&lt;li&gt;Store buffer provides &lt;strong&gt;read-your-own-writes&lt;/strong&gt; consistency within a transaction&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  4. Copy-on-Write Transactional Map
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Isolation Level
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;READ COMMITTED&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Repeatable reads aren't guaranteed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AtomicReference&amp;lt;ConcurrentMap&amp;lt;K, V&amp;gt;&amp;gt;&lt;/code&gt; -&amp;gt; Ref to current map snapshot&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transaction Lifecycle
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Schedule Phase
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Just record operations in local list&lt;/li&gt;
&lt;li&gt;No locks, no map access&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Validation Phase
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Retry loop:&lt;/strong&gt;&lt;br&gt;
A quite simple retry loop&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;txMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;                      &lt;span class="c1"&gt;// Get current snapshot&lt;/span&gt;
    &lt;span class="n"&gt;underlyingMap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;          &lt;span class="c1"&gt;// COPY entire map&lt;/span&gt;
    &lt;span class="c1"&gt;// Apply all operations to local copy&lt;/span&gt;
    &lt;span class="n"&gt;txs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;forEach&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tryValidate&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hasWrite&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;txMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;compareAndSet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ConcurrentHashMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;underlyingMap&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;tryValidate():&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply operation to the local copy of the underlyingMap&lt;/li&gt;
&lt;li&gt;Store result in child transaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If CAS fails:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Another transaction committed between snapshot and CAS&lt;/li&gt;
&lt;li&gt;Retry: re-copy map, re-apply operations, re-attempt CAS&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Commit Phase
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;CAS succeeded, so operations already applied to new map&lt;/li&gt;
&lt;li&gt;Just complete futures with stored results&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Unique Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Entire map copied on every write transaction&lt;/strong&gt; -&amp;gt; doesn't scale with map size&lt;/li&gt;
&lt;li&gt;Under high contention: lots of CAS retries -&amp;gt; lots of copies -&amp;gt; terrible write heavy performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best read performance&lt;/strong&gt; when map is mostly read (no blocking, no locking)&lt;/li&gt;
&lt;li&gt;Simple implementation but hard on memory and GC under high write contention&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5. Flat Combined Transactional Maps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Isolation Level
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SERIALIZABLE&lt;/strong&gt; (all operations fully serialized through combiner)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Idea
&lt;/h3&gt;

&lt;p&gt;Instead of threads fighting for locks, they:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enqueue their operation&lt;/li&gt;
&lt;li&gt;Spin waiting for a "combiner" thread to execute it&lt;/li&gt;
&lt;li&gt;If no combiner exists, try to become one&lt;/li&gt;
&lt;li&gt;Combiner executes operations for &lt;strong&gt;all&lt;/strong&gt; waiting threads in one critical section&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Implementation Variants
&lt;/h3&gt;

&lt;h4&gt;
  
  
  A. FlatCombinedTxMap (using combiners)
&lt;/h4&gt;

&lt;p&gt;Uses one of four combiner types:&lt;/p&gt;




&lt;h5&gt;
  
  
  UnboundCombiner
&lt;/h5&gt;

&lt;p&gt;This is based on the approach described in &lt;a href="https://people.csail.mit.edu/shanir/publications/Flat%20Combining%20SPAA%2010.pdf" rel="noopener noreferrer"&gt;this paper&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread-local &lt;code&gt;Node&amp;lt;E, R&amp;gt;&lt;/code&gt; for each thread&lt;/li&gt;
&lt;li&gt;Shared publication list (accessible via an atomic ref &lt;code&gt;Node&lt;/code&gt; head)&lt;/li&gt;
&lt;li&gt;Per-node &lt;code&gt;StatefulAction&lt;/code&gt; with:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Action&amp;lt;E, R&amp;gt; action&lt;/code&gt; -&amp;gt; This is a volatile barrier&lt;/li&gt;
&lt;li&gt;&lt;code&gt;R result&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AtomicInteger status&lt;/code&gt; (ACTIVE/INACTIVE)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Dummy sentinel node marks end of queue&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Enqueue:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isInactive&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setActive&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;prevHead&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAndSet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Atomic swap&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setNext&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prevHead&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Combine:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;             &lt;span class="c1"&gt;// Spin until result ready&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tryLock&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;scanCombineApply&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;           &lt;span class="c1"&gt;// Process all nodes&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;enqueueIfInactive&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;          &lt;span class="c1"&gt;// Re-enqueue if removed&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;scanCombineApply:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traverse from head to DUMMY&lt;/li&gt;
&lt;li&gt;For each node with non-null action:

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;node.statefulAction.apply(e)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;node.action = null&lt;/code&gt; (signals completion)&lt;/li&gt;
&lt;li&gt;Set age to current count&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Every &lt;code&gt;threshold&lt;/code&gt; passes, scan and &lt;strong&gt;dequeue aged nodes&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;(currentCount - node.age) &amp;gt;= threshold&lt;/code&gt;: unlink, set INACTIVE&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Keeping Track of a Livelock Bug I Fixed:&lt;/strong&gt;&lt;br&gt;
Threads could get stuck when the combiner removed their node before they noticed their action was applied. Fixed by forcing combiners to always re-enqueue and apply their own action rather than trusting others.&lt;/p&gt;


&lt;h5&gt;
  
  
  NodeCyclingCombiner (Reuses nodes)
&lt;/h5&gt;

&lt;p&gt;&lt;strong&gt;Key Difference:&lt;/strong&gt; Reuses nodes instead of cleanup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thread-local &lt;code&gt;Node&amp;lt;E, R&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Each node has &lt;code&gt;AtomicInteger status&lt;/code&gt; (NOT_COMBINER/IS_COMBINER)&lt;/li&gt;
&lt;li&gt;Shared &lt;code&gt;AtomicReference&amp;lt;Node&amp;gt;&lt;/code&gt; tail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Reset current node&lt;/span&gt;
&lt;span class="nc"&gt;Node&lt;/span&gt; &lt;span class="n"&gt;newTail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;newTail&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;NOT_COMBINER&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Swap, my node becomes tail, get previous tail&lt;/span&gt;
&lt;span class="n"&gt;curNode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAndSet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newTail&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;curNode&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;              &lt;span class="c1"&gt;// Previous tail becomes my node&lt;/span&gt;

&lt;span class="n"&gt;curNode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;curNode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setNext&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newTail&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Wait to become combiner&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;curNode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;NOT_COMBINER&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;//Node chain should look something like this given 3 threads(T1, T2, T3 with the inital node T0),  waiting for their result to be applied, assuming natural order&lt;/span&gt;
&lt;span class="c1"&gt;//TO -&amp;gt; T1 -&amp;gt; T2 -&amp;gt; T3 &lt;/span&gt;
&lt;span class="c1"&gt;// T3 node will be marked as the combiner when Thread 1, finishes applying &lt;/span&gt;

&lt;span class="c1"&gt;// Now I'm the combiner, traverse and apply&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;curNode&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;next&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;next&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;next&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lazySet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;IS_COMBINER&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Make next last node in the queue combiner&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Unique Property:&lt;/strong&gt; Nodes cycle between threads, no cleanup needed.&lt;/p&gt;




&lt;h5&gt;
  
  
  AtomicArrayCombiner (A bounded combiner)
&lt;/h5&gt;

&lt;p&gt;&lt;strong&gt;Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AtomicReferenceArray&amp;lt;Node&amp;gt;&lt;/code&gt; of size &lt;code&gt;capacity&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AtomicLong cellNum&lt;/code&gt; -&amp;gt; monotonically increasing cell assignment&lt;/li&gt;
&lt;li&gt;Per-thread &lt;code&gt;ThreadLocal&amp;lt;Node&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Enqueue:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;cell&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cellNum&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAndIncrement&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// CAS into array slot&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;cells&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;compareAndSet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cell&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;yield&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// Slot occupied, retry&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Combine:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;statefulAction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isApplied&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tryLock&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;scanCombineApply&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// Scan entire array&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;idle&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;scanCombineApply:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Node&lt;/span&gt; &lt;span class="n"&gt;curr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cells&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;curr&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;curr&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;statefulAction&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;cells&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setOpaque&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Clear and apply every slot to prevent a situation where waiters spin on an unapplied node forever&lt;/span&gt;
        &lt;span class="c1"&gt;//Best when working with a fixed capacity of threads&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Unique Property:&lt;/strong&gt; Fixed memory, no pointer chasing, but potential false sharing.&lt;/p&gt;




&lt;h5&gt;
  
  
  SynchronizedCombiner (Baseline)
&lt;/h5&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;unlock&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plain lock which is used as baseline to measure if flat combining actually helps.&lt;/p&gt;




&lt;h4&gt;
  
  
  B. SegmentedCombinedTxMap (Partitioned)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Key Difference:&lt;/strong&gt; Instead of one combiner for the entire map, &lt;strong&gt;one combiner per key + one for SIZE&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Structures:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentMap&amp;lt;K, Combiner&amp;lt;Map&amp;lt;K, V&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt; -&amp;gt; Maps each key to its own combiner&lt;/li&gt;
&lt;li&gt;Separate &lt;code&gt;sizeCombiner&lt;/code&gt; for SIZE operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Group operations by key&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="nl"&gt;keyToFuture:&lt;/span&gt;
    &lt;span class="n"&gt;combiner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getCombiner&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Get or create combiner for this key&lt;/span&gt;
    &lt;span class="n"&gt;combiner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;combine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operation&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Separately handle SIZE operations&lt;/span&gt;
&lt;span class="n"&gt;sizeCombiner&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;combine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* apply size ops */&lt;/span&gt; &lt;span class="o"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Unique Properties:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better parallelism -&amp;gt; different keys can be combined concurrently&lt;/li&gt;
&lt;li&gt;More overhead -&amp;gt; one combiner per key&lt;/li&gt;
&lt;li&gt;Writes per key still serialized, but across different keys they're parallel
Surprisingly, from the benchmarks, a single combiner scales better than a segmented combiner, mainly due to the fact that combiners benefit the fact that batching operations and spinning, amortizes the cost of lock contention, unlike a segmented combiner where batching is less common due to each operation of a transaction mainly working on different keys. However this is just a speculation
---&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. MVCC Transactional Map
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Isolation Level
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Snapshot Isolation&lt;/strong&gt; -&amp;gt; each transaction reads from a consistent snapshot taken at begin time&lt;/li&gt;
&lt;li&gt;No dirty reads, no non-repeatable reads&lt;/li&gt;
&lt;li&gt;SIZE reads are dirty (no version chain maintained for size)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Idea
&lt;/h3&gt;

&lt;p&gt;Rather than blocking readers with locks, each key maintains a &lt;strong&gt;version chain&lt;/strong&gt;, an ordered list of all historical values written to that key. Each version has a &lt;code&gt;beginTs&lt;/code&gt; and &lt;code&gt;endTs&lt;/code&gt; defining the epoch range in which it is visible. Readers find the version that overlaps their snapshot epoch without acquiring any locks. Writers append new versions and conflict only with other concurrent writers on the same key.&lt;/p&gt;

&lt;p&gt;This is based on the approach described in &lt;a href="https://www.vldb.org/pvldb/vol10/p781-Wu.pdf" rel="noopener noreferrer"&gt;the VLDB paper&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Data Structures
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentMap&amp;lt;K, VersionChain&amp;lt;V&amp;gt;&amp;gt;&lt;/code&gt; — maps each key to its version chain&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentMap&amp;lt;K, KeyStatus&amp;gt;&lt;/code&gt; — per-key write lock (CAS-based, not a real lock)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EpochTracker&lt;/code&gt; — global epoch counter, tracks active transaction begin epochs for GC&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GCThread&lt;/code&gt; — background thread that prunes unreachable versions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AtomicInteger(size)&lt;/code&gt; — dirty global size counter&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Transaction Lifecycle
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Begin Phase
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;tBegin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;epochTracker&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentEpoch&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Snapshot epoch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The transaction records the current global epoch as its &lt;code&gt;tBegin&lt;/code&gt;. This is the epoch from which it reads any version visible at &lt;code&gt;tBegin&lt;/code&gt; is part of its snapshot. The epoch tracker also registers this &lt;code&gt;tBegin&lt;/code&gt; so the GC knows the oldest epoch still in use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For writes (PUT/REMOVE):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attempt to acquire the &lt;code&gt;KeyStatus&lt;/code&gt; write lock for that key via CAS&lt;/li&gt;
&lt;li&gt;If the key is already locked by another transaction, &lt;strong&gt;we abort immediately&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Check that &lt;code&gt;tBegin&lt;/code&gt; still overlaps the latest version on the chain (late-arriving transaction check):
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!(&lt;/span&gt;&lt;span class="n"&gt;tBegin&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;beginTs&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;tBegin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;endTs&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt; &lt;span class="n"&gt;abort&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If both checks pass, we record the write operation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For reads (GET/CONTAINS):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check if the key's &lt;code&gt;KeyStatus&lt;/code&gt; is held by another transaction, if so, abort&lt;/li&gt;
&lt;li&gt;Snapshot the overlapping version at &lt;code&gt;tBegin&lt;/code&gt; immediately:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;versionChain&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;findOverlap&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tBegin&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Record the read operation and the version seen&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Validation Phase
&lt;/h4&gt;

&lt;p&gt;At commit time, a &lt;code&gt;tCommit&lt;/code&gt; epoch is assigned via &lt;code&gt;epochTracker.newEpoch()&lt;/code&gt;.Sequentially, each read operation then re-checks whether the version it saw at &lt;code&gt;tBegin&lt;/code&gt; is still the overlapping version at &lt;code&gt;tCommit&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Version&lt;/span&gt; &lt;span class="n"&gt;overlapAtCommit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;versionChain&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;findOverlap&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tCommit&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;overlapAtCommit&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;abort&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Someone wrote to this key between tBegin and tCommit&lt;/span&gt;
&lt;span class="c1"&gt;//A quick example&lt;/span&gt;
&lt;span class="c1"&gt;//tBegin = 100, tCommit = 105&lt;/span&gt;
&lt;span class="c1"&gt;//key = "A", versions = [(0,101,"old"), (102,INF,"new")] //INF = INFINITY&lt;/span&gt;
&lt;span class="c1"&gt;// tBegin should see version "old", while tCommit should see version "new"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches read-write conflicts, if any key you read was written by a concurrent transaction between your begin and commit, you abort.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Commit Phase
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;For each write operation: append a new version to the version chain with &lt;code&gt;beginTs = tCommit&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set the previous latest version's &lt;code&gt;endTs = tCommit&lt;/code&gt; (closing its visibility window)&lt;/li&gt;
&lt;li&gt;Release all &lt;code&gt;KeyStatus&lt;/code&gt; write locks&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;epochTracker.leaveEpoch(tBegin)&lt;/code&gt; so GC knows this epoch might no longer be visible&lt;/li&gt;
&lt;li&gt;If the version chain depth crosses the configured threshold, submit a cleanup request to the GC thread&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Version Chains
&lt;/h3&gt;

&lt;p&gt;Each key's history is stored in a &lt;code&gt;VersionChain&amp;lt;V&amp;gt;&lt;/code&gt;. Two independent implementations exist.&lt;/p&gt;

&lt;h4&gt;
  
  
  QueueVersionChain
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Backed by a &lt;code&gt;ConcurrentLinkedDeque&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;findOverlap()&lt;/code&gt; does a &lt;strong&gt;descending linear scan&lt;/strong&gt;, meaning we start from the tail of the deque, to find newer versions easier and cut traversal time&lt;/li&gt;
&lt;li&gt;Size tracked with an &lt;code&gt;AtomicLong&lt;/code&gt; to avoid O(N) &lt;code&gt;size()&lt;/code&gt; calls on the deque&lt;/li&gt;
&lt;li&gt;Better write performance, lower overhead per version&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  NavigableVersionChain
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Backed by a &lt;code&gt;ConcurrentSkipListMap&lt;/code&gt; keyed by &lt;code&gt;beginTs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;findOverlap()&lt;/code&gt; uses &lt;code&gt;floorEntry(tBegin)&lt;/code&gt;, O(log N)&lt;/li&gt;
&lt;li&gt;More expensive writes due to skip list insertion&lt;/li&gt;
&lt;li&gt;Pays off as version chains grow longer under sustained write load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both implementations maintain a &lt;code&gt;MinVisibleEpoch&lt;/code&gt; cache to short-circuit GC scans when no prunable versions exist, a minimal optimization which avoids a full traversal when nothing has changed.&lt;/p&gt;




&lt;h3&gt;
  
  
  Epoch Tracking
&lt;/h3&gt;

&lt;p&gt;The epoch tracker serves two purposes: assigning monotonically increasing commit epochs, and tracking the minimum &lt;code&gt;tBegin&lt;/code&gt; of all active transactions so the GC knows which versions are safe to delete.&lt;/p&gt;

&lt;p&gt;Three implementations exist:&lt;/p&gt;

&lt;h4&gt;
  
  
  DefaultEpochTracker
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentHashMap&amp;lt;Long, AtomicLong&amp;gt;&lt;/code&gt; mapping epoch -&amp;gt; active transaction count&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;currentEpoch()&lt;/code&gt; registers the transaction in the map and increments the counter&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;leaveEpoch()&lt;/code&gt; decrements; removes the entry when count hits zero&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;minVisibleEpoch()&lt;/code&gt; streams over the key set to find the minimum&lt;/li&gt;
&lt;li&gt;Suitable for virtual threads since keys are shared across threads&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Long2LongEpochTracker
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Synchronized &lt;code&gt;Long2LongOpenHashMap&lt;/code&gt; (using fast-util's Long2LongHashMap) maps thread ID -&amp;gt; current epoch&lt;/li&gt;
&lt;li&gt;No boxing on values which avoids allocation pressure on the hot path&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;leaveEpoch()&lt;/code&gt; writes a sentinel value (&lt;code&gt;-1&lt;/code&gt;) rather than removing the entry&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;minVisibleEpoch()&lt;/code&gt; scans values, skipping sentinels&lt;/li&gt;
&lt;li&gt;This is best paired with pooled platform threads&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  LongToArrayEpochTracker &lt;em&gt;(default)&lt;/em&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ConcurrentHashMap&amp;lt;Long, long[]&amp;gt;&lt;/code&gt; mapping thread ID to a single-element &lt;code&gt;long[]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Same thread-local design as &lt;code&gt;Long2LongEpochTracker&lt;/code&gt; but avoids boxing via the &lt;code&gt;long[]&lt;/code&gt; trick without fast-utils's hashmap&lt;/li&gt;
&lt;li&gt;This is best paired with pooled platform threads&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  GC Thread
&lt;/h3&gt;

&lt;p&gt;Version chains grow unboundedly without cleanup. The GC thread handles pruning old versions that no transaction can ever see again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A single platform daemon thread continuously drains a bounded queue(&lt;code&gt;LinkedBlockingQueue&lt;/code&gt;) of cleanup requests&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;ScheduledExecutorService&lt;/code&gt; (virtual thread) refreshes a cached &lt;code&gt;minVisibleEpoch&lt;/code&gt; from an &lt;code&gt;EpochTracker&lt;/code&gt; every 100ms&lt;/li&gt;
&lt;li&gt;Writer transactions submit a cleanup request when &lt;code&gt;versionChain.size() % threshold == 0&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why cached epoch reads:&lt;/strong&gt;&lt;br&gt;
Reading &lt;code&gt;minVisibleEpoch()&lt;/code&gt; on every write transaction commit was a hotspot. It involves scanning the epoch tracker under contention. Decoupling this into a scheduled read trades  precision for significantly lower write path overhead. Versions may survive slightly longer than necessary though.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pruning logic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// A version is prunable if:&lt;/span&gt;
&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;endTs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The latest version is always preserved regardless of its timestamps, since new transactions may still need it. The &lt;code&gt;MinVisibleEpoch&lt;/code&gt; cache per version chain short-circuits the scan entirely if no version has an &lt;code&gt;endTs&lt;/code&gt; below the current GC epoch.&lt;/p&gt;




&lt;h3&gt;
  
  
  Unique Properties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Readers never acquire locks&lt;/strong&gt; -&amp;gt; unlike the optimistic and pessimistic implementations where readers hold read locks or increment atomic counters, MVCC readers are completely non-blocking. A read is just a version chain traversal with no shared mutable state touched&lt;/li&gt;
&lt;li&gt;Write-write conflicts are detected at commit time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Abort-on-conflict instead of wait-on-conflict&lt;/strong&gt; -&amp;gt; when a write lock is already held, the transaction aborts immediately rather than parking. This keeps latency bounded but means abort rates climb sharply under high write contention, unlike other transactional map implementations(except Copy On Write) which forces writers to wait&lt;/li&gt;
&lt;li&gt;The queue chain favors writes (O(1) append, cheap traversal for short chains) while the navigable chain favors reads (O(log N) lookup) at the cost of more expensive skip list insertions.&lt;/li&gt;
&lt;li&gt;The shared epoch &lt;code&gt;DefaultEpochTracker&lt;/code&gt; works correctly with any thread model but contends on &lt;code&gt;computeIfAbsent()&lt;/code&gt; calls, which could kill perf if frequent. The thread keyed trackers (&lt;code&gt;LongToArray&lt;/code&gt;, &lt;code&gt;Long2Long&lt;/code&gt;) eliminate that contention entirely since each key is owned by exactly one thread&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;Isolation Level&lt;/th&gt;
&lt;th&gt;When Locks Acquired&lt;/th&gt;
&lt;th&gt;Readers Block Writers?&lt;/th&gt;
&lt;th&gt;Writers Block Readers?&lt;/th&gt;
&lt;th&gt;Probably Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Optimistic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;READ COMMITTED (per-key SERIALIZABLE)&lt;/td&gt;
&lt;td&gt;Reads: eager, Writes: lazy&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Balanced workloads with strong consistency needs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pessimistic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;READ COMMITTED (per-key SERIALIZABLE)&lt;/td&gt;
&lt;td&gt;Both lazy&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Similar to optimistic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read Uncommitted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;READ UNCOMMITTED&lt;/td&gt;
&lt;td&gt;Writes only at validation&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Write-heavy and weak consistency is OK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copy-on-Write&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;READ COMMITTED&lt;/td&gt;
&lt;td&gt;Never (uses CAS)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Read heavy or small map size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flat Combined&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SERIALIZABLE&lt;/td&gt;
&lt;td&gt;N/A (combiners)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;High contention, batch benefits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Segmented Flat Combined&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SERIALIZABLE&lt;/td&gt;
&lt;td&gt;N/A (per-key combiners)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Outperformed by a single flat combined map&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MVCC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SNAPSHOT&lt;/td&gt;
&lt;td&gt;Reads: eager Writes: eager&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Best for read heavy scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Links/References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;MVCC Paper:&lt;/strong&gt;  &lt;a href="https://www.vldb.org/pvldb/vol10/p781-Wu.pdf" rel="noopener noreferrer"&gt;https://www.vldb.org/pvldb/vol10/p781-Wu.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flat Combining Paper:&lt;/strong&gt; &lt;a href="https://people.csail.mit.edu/shanir/publications/Flat%20Combining%20SPAA%2010.pdf" rel="noopener noreferrer"&gt;https://people.csail.mit.edu/shanir/publications/Flat%20Combining%20SPAA%2010.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transactional Collections Paper:&lt;/strong&gt; &lt;a href="https://people.csail.mit.edu/mcarbin/papers/ppopp07.pdf" rel="noopener noreferrer"&gt;https://people.csail.mit.edu/mcarbin/papers/ppopp07.pdf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GitHub:&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Optimistic Tx-Map: &lt;a href="https://github.com/kusoroadeolu/tx-map" rel="noopener noreferrer"&gt;https://github.com/kusoroadeolu/tx-map&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pessimistic/Copy-On-Write/Read Uncommitted Tx-Map: &lt;a href="https://github.com/kusoroadeolu/tx-map/tree/pessimistic-txmap" rel="noopener noreferrer"&gt;https://github.com/kusoroadeolu/tx-map/tree/pessimistic-txmap&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Flat Combining Tx-Map: &lt;a href="https://github.com/kusoroadeolu/tx-map/tree/fc-txmap" rel="noopener noreferrer"&gt;https://github.com/kusoroadeolu/tx-map/tree/fc-txmap&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MVCC Tx-Map: &lt;a href="https://github.com/kusoroadeolu/tx-map/tree/mvcc-txmap" rel="noopener noreferrer"&gt;https://github.com/kusoroadeolu/tx-map/tree/mvcc-txmap&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>java</category>
      <category>concurrency</category>
      <category>database</category>
      <category>datastructure</category>
    </item>
    <item>
      <title>Improving my MVCC Transactional Map</title>
      <dc:creator>Kush V</dc:creator>
      <pubDate>Sat, 14 Mar 2026 00:57:12 +0000</pubDate>
      <link>https://dev.to/kusoroadeolu/improving-my-mvcc-transactional-map-2f7j</link>
      <guid>https://dev.to/kusoroadeolu/improving-my-mvcc-transactional-map-2f7j</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;This post walks through the performance journey of an MVCC transactional map I've been building in Java, from a few thousand ops/s to a few million. It's mostly numbers and profiling observations, so if that's not your thing, fair warning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The original writeup can be found &lt;a href="https://github.com/kusoroadeolu/articles/blob/main/transactional-maps/mvcc-optimization.md" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Setup
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;JMH version: 1.37&lt;/li&gt;
&lt;li&gt;JVM: Java 25, HotSpot 64-Bit Server VM (25+37-LTS-3491)&lt;/li&gt;
&lt;li&gt;Benchmark mode: Throughput (ops/s)&lt;/li&gt;
&lt;li&gt;Warmup: 10 iterations × 1s each&lt;/li&gt;
&lt;li&gt;Measurement: 5 iterations × 1s each&lt;/li&gt;
&lt;li&gt;Forks: 2&lt;/li&gt;
&lt;li&gt;Thread configuration: 1, 2 , 4, 8 (Platform threads)
CPU Specs: Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz (2.50 GHz), 8 cores&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Journey
&lt;/h2&gt;

&lt;p&gt;Initially my MVCC txMap had good read numbers for thrpt and decent write numbers, though the error margins for the write numbers were bad, so I decided to investigate. While investigating, I encountered an issue. Also, just a quick note before we continue that &lt;code&gt;ActiveTransactions&lt;/code&gt; just keeps tab on all active transactions at the moment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;//A map keeping track of all active transactions&lt;/span&gt;
&lt;span class="nc"&gt;ActiveTransactions&lt;/span&gt; &lt;span class="n"&gt;activeTxns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mvccMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;activeTransactions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;copy&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Copied the entire map on active txns, could be thousands &lt;/span&gt;
&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;activeTxns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findMinVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; 
&lt;span class="n"&gt;versionChain&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;removeUnreachableVersions&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// ActiveTransactions method call&lt;/span&gt;
&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;findMinActiveEpoch&lt;/span&gt;&lt;span class="o"&gt;(){&lt;/span&gt;
    &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HashSet&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt; &lt;span class="c1"&gt;//Copy to set to prevent duplicates and min traversals, caused an OOME!!&lt;/span&gt;
    &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;min&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="o"&gt;){&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;){&lt;/span&gt;
            &lt;span class="n"&gt;min&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;min&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This line of code caused an OOME under contention the active given the fact the active transactions map could be very large, if we’re copying and iterating it anytime we want to prune an old version, issues would occur. To fix this, I used a single writer that automatically tracks the minimum visible epoch(tBegin) from all active transactions in the set. The main tradeoff with this was that under contention the single writer could not keep up, but it was better than copying the map on every write transaction.&lt;/p&gt;

&lt;p&gt;However, write throughput dropped to ~20k ops/s after the single writer fix, so I profiled to find out why. The culprit was &lt;code&gt;SerialTransactionKeeper#remove&lt;/code&gt;. Instead of submitting a remove request to the queue, it was removing a non-existent entry, meaning every transaction commit was doing an O(N) traversal for nothing. And since nothing was actually being removed from the map, the &lt;code&gt;minVisibleEpoch&lt;/code&gt; never advanced, so the GC epoch reclamation was doing meaningless work the entire time. Fixing this brought numbers back to a reasonable range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,248.762&lt;/td&gt;
&lt;td&gt;± 921.068&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,205.265&lt;/td&gt;
&lt;td&gt;± 413.555&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,407.078&lt;/td&gt;
&lt;td&gt;± 528.927&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,566.666&lt;/td&gt;
&lt;td&gt;± 610.823&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,021.587&lt;/td&gt;
&lt;td&gt;± 487.228&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,475.348&lt;/td&gt;
&lt;td&gt;± 388.932&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,693.183&lt;/td&gt;
&lt;td&gt;± 532.851&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;13,806.692&lt;/td&gt;
&lt;td&gt;± 4,179.536&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;After&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;454,138.457&lt;/td&gt;
&lt;td&gt;± 92,558.035&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;449,543.650&lt;/td&gt;
&lt;td&gt;± 226,056.899&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;623,248.848&lt;/td&gt;
&lt;td&gt;± 177,717.798&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;464,862.805&lt;/td&gt;
&lt;td&gt;± 234,641.969&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;369,462.166&lt;/td&gt;
&lt;td&gt;± 79,723.306&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;445,274.916&lt;/td&gt;
&lt;td&gt;± 171,877.221&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;728,382.685&lt;/td&gt;
&lt;td&gt;± 174,790.985&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;902,537.739&lt;/td&gt;
&lt;td&gt;± 93,291.275&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After another round of profiling, I realized that I was making &lt;code&gt;findOverlap()&lt;/code&gt; calls frequently on my read heavy transactions, so basically an O(n) traversal for each find overlap call, which just becomes worse as the version queue per key grows, so I decided to use a different approach, I decided to use a navigable map as my version chain to reduce this traversal time per call to O(logN), at the cost of more expensive writes, and the numbers showed significant improvement&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;601,083.402&lt;/td&gt;
&lt;td&gt;± 165,550.604&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;642,442.703&lt;/td&gt;
&lt;td&gt;± 187,617.695&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;688,103.749&lt;/td&gt;
&lt;td&gt;± 89,658.431&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;690,432.423&lt;/td&gt;
&lt;td&gt;± 101,032.453&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;454,926.218&lt;/td&gt;
&lt;td&gt;± 184,083.441&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;594,926.342&lt;/td&gt;
&lt;td&gt;± 92,822.567&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;836,488.562&lt;/td&gt;
&lt;td&gt;± 80,008.830&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;979,821.239&lt;/td&gt;
&lt;td&gt;± 89,332.281&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After looking through my code again, I realized I never actually started the worker thread for my &lt;code&gt;SerialTransactionKeeper&lt;/code&gt; class. After starting the thread and benchmarking again I was basically back to square one&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;37,911.129&lt;/td&gt;
&lt;td&gt;± 13,675.765&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;89,182.952&lt;/td&gt;
&lt;td&gt;± 16,428.530&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,197,666.295&lt;/td&gt;
&lt;td&gt;± 370,356.847&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;868,822.634&lt;/td&gt;
&lt;td&gt;± 334,063.540&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,849.386&lt;/td&gt;
&lt;td&gt;± 1,326.803&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;9,548.036&lt;/td&gt;
&lt;td&gt;± 6,833.268&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;25,872.105&lt;/td&gt;
&lt;td&gt;± 13,197.781&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,132,311.163&lt;/td&gt;
&lt;td&gt;± 285,098.340&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Reading through my profile data, I realized that garbage collecting on a writer transaction thread was causing a lot of inconsistencies across all benchmarks and thread counts, so I decided to move this to a separate worker thread. Now, a writer txn submits a cleanup request to the GC thread when the version chain depth reaches a certain threshold&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;864,770.978&lt;/td&gt;
&lt;td&gt;± 123,348.212&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,187,432.216&lt;/td&gt;
&lt;td&gt;± 171,323.762&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;827,029.790&lt;/td&gt;
&lt;td&gt;± 525,292.066&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;661,419.334&lt;/td&gt;
&lt;td&gt;± 215,284.213&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;476,757.401&lt;/td&gt;
&lt;td&gt;± 91,269.885&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;622,144.154&lt;/td&gt;
&lt;td&gt;± 155,951.737&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;767,387.142&lt;/td&gt;
&lt;td&gt;± 171,878.778&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;819,022.141&lt;/td&gt;
&lt;td&gt;± 337,489.007&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As a minor optimization, I decided to cache the &lt;code&gt;minVisibleEpoch&lt;/code&gt; timestamp in my version chain, to prevent redundant iterations through version chains&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,384,520.967&lt;/td&gt;
&lt;td&gt;± 151,058.220&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,559,840.856&lt;/td&gt;
&lt;td&gt;± 361,048.003&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,328,777.375&lt;/td&gt;
&lt;td&gt;± 619,468.965&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;843,850.257&lt;/td&gt;
&lt;td&gt;± 241,581.930&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;527,538.045&lt;/td&gt;
&lt;td&gt;± 160,021.700&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;836,844.450&lt;/td&gt;
&lt;td&gt;± 95,255.683&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;944,841.844&lt;/td&gt;
&lt;td&gt;± 321,298.094&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;764,114.028&lt;/td&gt;
&lt;td&gt;± 472,135.579&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Looking at my profiled data again, I realized submitting requests to the GC thread was still a hotspot, so I decided to spread out the frequency in which requests are submitted by only submitting cleanup requests when &lt;code&gt;depth % threshold == 0&lt;/code&gt; rather than every time &lt;code&gt;depth &amp;gt; threshold&lt;/code&gt;, spacing out GC cleanup submissions across writes&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,499,641.094&lt;/td&gt;
&lt;td&gt;± 186,580.267&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,800,303.740&lt;/td&gt;
&lt;td&gt;± 160,148.052&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,420,702.483&lt;/td&gt;
&lt;td&gt;± 566,203.194&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;828,145.076&lt;/td&gt;
&lt;td&gt;± 393,441.472&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;654,820.291&lt;/td&gt;
&lt;td&gt;± 132,849.682&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;895,029.640&lt;/td&gt;
&lt;td&gt;± 125,955.480&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;987,045.255&lt;/td&gt;
&lt;td&gt;± 437,727.470&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;853,466.979&lt;/td&gt;
&lt;td&gt;± 556,489.979&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Write numbers improved slightly and error margins tightened up. Next I tried segmenting my active transactions class just to measure the difference, and the results were a bit surprising&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;601,762.093&lt;/td&gt;
&lt;td&gt;± 137,771.491&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,005,846.389&lt;/td&gt;
&lt;td&gt;± 96,703.066&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,700,100.197&lt;/td&gt;
&lt;td&gt;± 144,301.351&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,321,714.189&lt;/td&gt;
&lt;td&gt;± 346,860.821&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;321,688.060&lt;/td&gt;
&lt;td&gt;± 95,079.876&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;549,510.063&lt;/td&gt;
&lt;td&gt;± 160,634.881&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;826,926.873&lt;/td&gt;
&lt;td&gt;± 149,437.015&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,026,862.457&lt;/td&gt;
&lt;td&gt;± 520,060.221&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My scaling basically inverted with my lowest thrpt for both operations being at one thread and the highest being at &amp;gt; 2 threads&lt;/p&gt;

&lt;p&gt;I then realized that I'd been using my virtual threads as my background worker threads which is a terrible idea and was probably contributing the high variance, so I decided to run with my background worker threads as platform threads instead&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Segmented Transaction Keeper&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;543,983.955&lt;/td&gt;
&lt;td&gt;± 133,134.608&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,020,562.096&lt;/td&gt;
&lt;td&gt;± 131,954.153&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,632,501.353&lt;/td&gt;
&lt;td&gt;± 197,194.523&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,444,733.689&lt;/td&gt;
&lt;td&gt;± 652,337.205&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;335,078.800&lt;/td&gt;
&lt;td&gt;± 128,993.334&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;580,449.127&lt;/td&gt;
&lt;td&gt;± 110,305.596&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;864,819.719&lt;/td&gt;
&lt;td&gt;± 159,350.383&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,120,983.469&lt;/td&gt;
&lt;td&gt;± 783,465.405&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Serial Transaction Keeper&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,251,410.402&lt;/td&gt;
&lt;td&gt;± 170,242.439&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,619,555.645&lt;/td&gt;
&lt;td&gt;± 107,851.864&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,345,362.000&lt;/td&gt;
&lt;td&gt;± 445,030.583&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;784,240.432&lt;/td&gt;
&lt;td&gt;± 407,925.752&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;640,249.753&lt;/td&gt;
&lt;td&gt;± 110,217.595&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;813,339.356&lt;/td&gt;
&lt;td&gt;± 155,722.320&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,011,854.484&lt;/td&gt;
&lt;td&gt;± 452,289.869&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;927,350.663&lt;/td&gt;
&lt;td&gt;± 194,550.049&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Moving on from that minor hiccup, I decided to change my strategy for version tracking to use map based epoch tracking rather than a single writer background thread since I noticed it became a bottleneck, and the results improved significantly, but oddly, at 2 threads, the thrpt is very bad. No idea why this is happening.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,191,449.356&lt;/td&gt;
&lt;td&gt;± 405,088.856&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;195,198.411&lt;/td&gt;
&lt;td&gt;± 37,480.037&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,733,823.527&lt;/td&gt;
&lt;td&gt;± 526,463.941&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,832,819.684&lt;/td&gt;
&lt;td&gt;± 791,310.719&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;898,844.327&lt;/td&gt;
&lt;td&gt;± 184,090.618&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;58,322.618&lt;/td&gt;
&lt;td&gt;± 6,505.121&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;118,490.002&lt;/td&gt;
&lt;td&gt;± 21,500.755&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;5,211,745.483&lt;/td&gt;
&lt;td&gt;± 672,226.456&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To find the issue, I looked at my profile data, and found &lt;code&gt;minVisibleEpoch()&lt;/code&gt; as a hotspot at 2 threads. The current issue is we scan the entire map for the min active epoch, ideally this shouldn't take too long but as the map grows, could become a bottleneck&lt;/p&gt;

&lt;p&gt;I then decided to use a navigable map for cheaper reads at the cost of writes being more expensive, and my data actually improved a lot, with stable variance across all thread counts&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,189,669.131&lt;/td&gt;
&lt;td&gt;± 219,454.082&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,209,368.581&lt;/td&gt;
&lt;td&gt;± 118,708.165&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,390,759.022&lt;/td&gt;
&lt;td&gt;± 138,734.722&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,831,714.755&lt;/td&gt;
&lt;td&gt;± 209,719.251&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;863,360.164&lt;/td&gt;
&lt;td&gt;± 174,821.023&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;695,208.420&lt;/td&gt;
&lt;td&gt;± 182,427.969&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;905,598.677&lt;/td&gt;
&lt;td&gt;± 107,143.027&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,438,306.517&lt;/td&gt;
&lt;td&gt;± 376,373.683&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I moved &lt;code&gt;minVisibleEpoch()&lt;/code&gt; reads off the writer transaction path by caching and scheduling them in the GC thread at 100ms intervals. The tradeoff being precision as the GC is working off a slightly stale epoch, but it keeps the writer path clean.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,426,304.887&lt;/td&gt;
&lt;td&gt;± 259,332.240&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,138,292.934&lt;/td&gt;
&lt;td&gt;± 71,983.527&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,297,350.055&lt;/td&gt;
&lt;td&gt;± 173,150.260&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,581,116.656&lt;/td&gt;
&lt;td&gt;± 142,358.242&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;831,511.799&lt;/td&gt;
&lt;td&gt;± 272,854.879&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;651,383.441&lt;/td&gt;
&lt;td&gt;± 125,000.038&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;864,550.813&lt;/td&gt;
&lt;td&gt;± 102,841.781&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,133,963.803&lt;/td&gt;
&lt;td&gt;± 371,369.242&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now that reads for minimum active epochs were scheduled, cached and off the hotpath, I decided to move back to a normal concurrent hashmap and compare results&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,033,144.625&lt;/td&gt;
&lt;td&gt;± 384,180.964&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,588,018.927&lt;/td&gt;
&lt;td&gt;± 270,740.481&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,729,477.638&lt;/td&gt;
&lt;td&gt;± 222,670.776&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,155,201.736&lt;/td&gt;
&lt;td&gt;± 367,418.564&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;806,951.062&lt;/td&gt;
&lt;td&gt;± 230,957.839&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;736,237.004&lt;/td&gt;
&lt;td&gt;± 198,644.974&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,163,394.531&lt;/td&gt;
&lt;td&gt;± 183,420.198&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,664,437.392&lt;/td&gt;
&lt;td&gt;± 348,898.556&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After looking at my queue version chain, I realized calling &lt;code&gt;size()&lt;/code&gt; to check version chain depth on each write txn was killing perf, since we had to scan the whole queue to find the size(even with the queue's dead nodes) and restart from the head if we reached a dead node, so I decided to use a long adder to track the approx size for cheaper &lt;code&gt;size()&lt;/code&gt; calls&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queue version chain&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,290,892.594&lt;/td&gt;
&lt;td&gt;± 491,972.988&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,604,327.428&lt;/td&gt;
&lt;td&gt;± 208,550.678&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,891,341.883&lt;/td&gt;
&lt;td&gt;± 250,730.085&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,937,383.908&lt;/td&gt;
&lt;td&gt;± 501,990.327&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;932,853.659&lt;/td&gt;
&lt;td&gt;± 165,703.963&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;966,232.277&lt;/td&gt;
&lt;td&gt;± 279,759.248&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,135,401.769&lt;/td&gt;
&lt;td&gt;± 380,147.279&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,620,440.108&lt;/td&gt;
&lt;td&gt;± 446,644.708&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I decided to use my queue version chain as my default version in the next benchmark.&lt;/p&gt;

&lt;p&gt;The hotspot across all benchmarks was now the &lt;code&gt;computeIfAbsent()&lt;/code&gt; call whenever a transaction requested its current epoch (tBegin) at creation time. To reduce contention on this path, I built a thread-local epoch tracker, not a literal ThreadLocal, but a map keyed by thread ID instead of epoch. Each thread maps to the epoch of the transaction it's currently hosting, and resets to a sentinel value once the transaction ends, signaling it's no longer active.&lt;/p&gt;

&lt;p&gt;This was a meaningful shift from the previous DefaultEpochTracker, which tracked all active transactions regardless of thread, making it more suitable for virtual threads but higher contention by nature. Since keys here can't actually be contested (each key belongs to exactly one thread), lock wait time dropped significantly and performance improves as a result. The tradeoff though is that it works best with a small, fixed pool of platform threads.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,887,534.127&lt;/td&gt;
&lt;td&gt;± 554,342.591&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,366,272.494&lt;/td&gt;
&lt;td&gt;± 324,415.185&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,162,652.894&lt;/td&gt;
&lt;td&gt;± 276,947.370&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,449,227.477&lt;/td&gt;
&lt;td&gt;± 443,072.582&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,271,287.026&lt;/td&gt;
&lt;td&gt;± 498,230.994&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,578,855.855&lt;/td&gt;
&lt;td&gt;± 213,555.495&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,340,979.654&lt;/td&gt;
&lt;td&gt;± 316,184.797&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,519,457.062&lt;/td&gt;
&lt;td&gt;± 700,083.591&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The variance was actually worse in some scenarios, sometimes &amp;gt;30% of the actual score. Rerunning these benchmarks, I noticed something odd looking at my profile data for memory allocation, a lot of memory was getting allocated but barely cleaned up by the GC while running my benchmarks, leading to high variance and my numbers tanking in unusual ways during benchmarking. The profile data showed most of this occurred when I started a transaction which wasn't too helpful. So I decided to look at my actual memory usage when running these benchmarks and I noticed around 95% of my memory was being used while running these benchmarks.&lt;/p&gt;

&lt;p&gt;My first suspect were my version chains, since stale versions might not be cleaned up by the GC. I decided to write a simple unit test(no idea why I didn't test my version chains earlier) to see if versions were being cleaned up, and they actually were not. The issue lied in a simple check I added earlier to prevent redundant O(N) lookups&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;    &lt;span class="nd"&gt;@Override&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;removeUnreachableVersions&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;tBegin&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tBegin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;epoch&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;//This simple line here, the issue was that minVisibleEpoch was always initialized as Long.MAX_VALUE, even if the epoch could be updated while non-visible version were getting pruned, the GC would never actually get the chance to prune those versions, because beginTs(seen epoch) would always be less than Long.MAX_VALUE&lt;/span&gt;

    &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;reset&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;//Reset the holder to Long.MAX_VALUE, to prevent a situation where we are sitting on an older end ts, from a version pruned a while back&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;latest&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Entry&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Version&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;E&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;versionMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;entrySet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;removeIf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;shouldRemove&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;endTs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;tBegin&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;ls&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="n"&gt;shouldRemove&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;endTs&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;epoch&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;epoch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;endTs&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;shouldRemove&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;});&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was changed to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;epoch&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MAX_VALUE&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;tBegin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;minVisibleEpoch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;epoch&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;//Actually gave the gc a chance to prune the older versions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, my numbers improved significantly and variance reduced a bit&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Version Chain&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,022,274.294&lt;/td&gt;
&lt;td&gt;± 397,643.609&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,250,631.595&lt;/td&gt;
&lt;td&gt;± 201,546.981&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,583,840.733&lt;/td&gt;
&lt;td&gt;± 444,871.511&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,291,496.019&lt;/td&gt;
&lt;td&gt;± 304,662.994&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,152,546.075&lt;/td&gt;
&lt;td&gt;± 350,295.696&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,933,701.271&lt;/td&gt;
&lt;td&gt;± 210,834.860&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,209,719.728&lt;/td&gt;
&lt;td&gt;± 611,222.951&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,058,722.339&lt;/td&gt;
&lt;td&gt;± 265,986.981&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,467,899.289&lt;/td&gt;
&lt;td&gt;± 343,129.882&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,224,346.814&lt;/td&gt;
&lt;td&gt;± 276,528.783&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,987,672.075&lt;/td&gt;
&lt;td&gt;± 142,924.001&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,474,317.537&lt;/td&gt;
&lt;td&gt;± 332,782.403&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,540,089.406&lt;/td&gt;
&lt;td&gt;± 107,245.569&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,941,961.678&lt;/td&gt;
&lt;td&gt;± 335,538.070&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,310,156.822&lt;/td&gt;
&lt;td&gt;± 319,971.721&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,599,563.143&lt;/td&gt;
&lt;td&gt;± 712,225.130&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I was still a bit skeptical about the variance, even though it was pretty reasonable, I realized a lot of memory was getting allocated in my thread local epoch tracker under contention due to &lt;code&gt;long&lt;/code&gt; boxing when updating epoch values, so I decided to try using &lt;strong&gt;fast-utils&lt;/strong&gt; synchronized &lt;code&gt;Long2LongHashMap&lt;/code&gt;, to prevent boxing and allocations under high contention. Rerunning the benchmarks again, memory allocation on that hotpath dropped to basically zero, and writes under contention did suffer a bit, though variance for both write and read heavy workloads were pretty good&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Version Chain&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,917,121.556&lt;/td&gt;
&lt;td&gt;± 191,182.041&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,592,747.908&lt;/td&gt;
&lt;td&gt;± 192,348.108&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,935,324.852&lt;/td&gt;
&lt;td&gt;± 380,159.702&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,587,901.962&lt;/td&gt;
&lt;td&gt;± 253,365.744&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,410,564.690&lt;/td&gt;
&lt;td&gt;± 394,036.694&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,425,199.868&lt;/td&gt;
&lt;td&gt;± 109,061.792&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,138,869.830&lt;/td&gt;
&lt;td&gt;± 106,605.873&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,006,265.789&lt;/td&gt;
&lt;td&gt;± 154,944.074&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,789,016.664&lt;/td&gt;
&lt;td&gt;± 258,145.402&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,247,223.769&lt;/td&gt;
&lt;td&gt;± 212,243.516&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,308,032.198&lt;/td&gt;
&lt;td&gt;± 234,208.698&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,395,297.149&lt;/td&gt;
&lt;td&gt;± 187,661.268&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,285,734.168&lt;/td&gt;
&lt;td&gt;± 222,636.258&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,738,846.900&lt;/td&gt;
&lt;td&gt;± 503,307.766&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,274,919.824&lt;/td&gt;
&lt;td&gt;± 46,946.253&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,893,349.970&lt;/td&gt;
&lt;td&gt;± 213,244.636&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The thrpt was great, though after some research I found out about a generic trick, using primitive arrays as generic types, so instead of boxed long values. I decided to try this out with CHM and compare it to the serialized long2long version&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ConcurrentMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;map&lt;/span&gt; &lt;span class="c1"&gt;//Instead of this, we could do&lt;/span&gt;
&lt;span class="nc"&gt;ConcurrentMap&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Long&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt;&lt;span class="o"&gt;[]&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;map&lt;/span&gt; &lt;span class="c1"&gt;//No boxing for values&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Version Chain&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,921,045.464&lt;/td&gt;
&lt;td&gt;± 379,095.211&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,490,400.560&lt;/td&gt;
&lt;td&gt;± 235,824.152&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,351,486.812&lt;/td&gt;
&lt;td&gt;± 330,871.332&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,961,814.543&lt;/td&gt;
&lt;td&gt;± 271,271.445&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,130,101.194&lt;/td&gt;
&lt;td&gt;± 365,901.882&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,967,520.781&lt;/td&gt;
&lt;td&gt;± 267,639.783&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;6,100,773.757&lt;/td&gt;
&lt;td&gt;± 555,229.766&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;5,384,214.753&lt;/td&gt;
&lt;td&gt;± 396,929.168&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,948,891.898&lt;/td&gt;
&lt;td&gt;± 461,806.810&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,256,910.691&lt;/td&gt;
&lt;td&gt;± 203,279.439&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,523,596.142&lt;/td&gt;
&lt;td&gt;± 196,569.443&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,408,471.703&lt;/td&gt;
&lt;td&gt;± 192,463.826&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,943,223.429&lt;/td&gt;
&lt;td&gt;± 276,821.883&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,636,191.583&lt;/td&gt;
&lt;td&gt;± 311,031.690&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,064,740.729&lt;/td&gt;
&lt;td&gt;± 416,730.295&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,574,077.728&lt;/td&gt;
&lt;td&gt;± 549,287.881&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Since, I've been testing thrpt for my mvcc map for "best case scenarios" i.e. no retries on aborts. I decided to test with retries on abort. Note that this is base-lined against my map with a &lt;code&gt;Long2ArrayEpochTracker&lt;/code&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Version Chain&lt;/th&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Cnt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Units&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,255,294.022&lt;/td&gt;
&lt;td&gt;± 417,518.780&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,026,195.253&lt;/td&gt;
&lt;td&gt;± 382,190.857&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,727,360.460&lt;/td&gt;
&lt;td&gt;± 469,491.078&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,310,257.104&lt;/td&gt;
&lt;td&gt;± 195,179.750&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,222,921.041&lt;/td&gt;
&lt;td&gt;± 334,896.201&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,352,625.744&lt;/td&gt;
&lt;td&gt;± 481,297.585&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;4,512,409.390&lt;/td&gt;
&lt;td&gt;± 385,141.320&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;readHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;3,414,450.281&lt;/td&gt;
&lt;td&gt;± 391,240.406&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,887,463.431&lt;/td&gt;
&lt;td&gt;± 507,767.083&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_1thread&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,330,658.185&lt;/td&gt;
&lt;td&gt;± 214,850.891&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,083,535.454&lt;/td&gt;
&lt;td&gt;± 214,987.876&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_2threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,192,807.369&lt;/td&gt;
&lt;td&gt;± 222,511.666&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2,093,721.029&lt;/td&gt;
&lt;td&gt;± 118,319.047&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_4threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,337,540.905&lt;/td&gt;
&lt;td&gt;± 198,100.953&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;queue&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,800,086.692&lt;/td&gt;
&lt;td&gt;± 403,405.040&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;writeHeavy_8threads&lt;/td&gt;
&lt;td&gt;nav&lt;/td&gt;
&lt;td&gt;thrpt&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1,028,280.423&lt;/td&gt;
&lt;td&gt;± 285,679.830&lt;/td&gt;
&lt;td&gt;ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;To fully understand this drop on the write heavy bench I compared profile data from this benchmark to those without retries. While everything looked 'similarish' on the CPU side, memory was a different story with memory usage spiking up from ~16GB to almost ~30GB at every iteration. Due to the frequency of aborts, for each retry, a new transaction object had to be created, meaning more memory allocated for the transaction object, its read/write operation objects and its completable values, hence more pressure on the GC (Java's GC), more GC pauses, lower thrpt and higher variance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;QueueVersionChain&lt;/code&gt; paired with &lt;code&gt;Long2ArrayEpochTracker&lt;/code&gt; ended up being the best all round configuration, reads comfortably in the multi-million ops/s range across thread counts, writes improved significantly from the initial ~2k. A few things are still unresolved though, the 2-thread throughput anomaly with map-based epoch tracking being the main one, and the retry memory pressure under high abort rates is worth revisiting. I'd likely make transaction objects reusable to reduce GC pressure on retries. &lt;/p&gt;

&lt;p&gt;However, if you want to check out the MVCC implementation, benchmark code, or the benchmarks under more realistic workloads i.e. Zipfian benchmarks. You can visit the &lt;strong&gt;GitHub&lt;/strong&gt; repository: &lt;a href="https://github.com/kusoroadeolu/tx-map/tree/mvcc-txmap" rel="noopener noreferrer"&gt;https://github.com/kusoroadeolu/tx-map/tree/mvcc-txmap&lt;/a&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>concurrency</category>
      <category>database</category>
      <category>benchmarking</category>
    </item>
    <item>
      <title>What Happens When You Give a Map Transactional Semantics</title>
      <dc:creator>Kush V</dc:creator>
      <pubDate>Fri, 27 Feb 2026 02:09:41 +0000</pubDate>
      <link>https://dev.to/kusoroadeolu/what-happens-when-you-give-a-java-map-transactional-semantics-1ham</link>
      <guid>https://dev.to/kusoroadeolu/what-happens-when-you-give-a-java-map-transactional-semantics-1ham</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;The goal of this writeup isn't necessarily to sell the idea of in memory transactional maps to you, but rather document my findings about the 5-6 different strategies involving transactional maps I implemented in &lt;a href="https://github.com/kusoroadeolu/tx-map" rel="noopener noreferrer"&gt;this repo&lt;/a&gt;.&lt;br&gt;
Before we continue, the main idea of transactional collections/maps, is to allow developers to achieve similar performance in large atomic regions as they would by interleaving multiple operations in mutual exclusion based primitives, while providing atomicity and different isolation levels based on preferences.&lt;br&gt;
With that said, lets get into the implementations&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; The original writeup can be found &lt;a href="https://github.com/kusoroadeolu/articles/tree/main/transactional-maps" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I initially started building these transactional maps with only a single implementation which was an optimistic like transactional map with semantic concurrency control. I got most of the ideas of this from this &lt;a href="https://people.csail.mit.edu/mcarbin/papers/ppopp07.pdf" rel="noopener noreferrer"&gt;paper&lt;/a&gt;. &lt;br&gt;
The main idea was pretty clear: &lt;strong&gt;transactional semantics for a map using optimistic concurrency control&lt;/strong&gt;. However, the paper assumed the reader/implementor had an underlying STM (Software Transactional Memory) runtime, hence, all the talk about rollbacks, aborts, open-nested, closed-nested transactions and memory level conflicts that the STM runtime handled for them.&lt;br&gt;
Most of the ideas presented by the paper were very smart, and a lot of the ideas were transferred into a good number of the transactional maps I built, however there were a lot of significant limitations which prevented me from making a near 1:1 copy of what the paper suggested and honestly what I initially planned to do. &lt;br&gt;
These limitations, rather than being roadblocks, ended up pushing me to explore different synchronization strategies and ultimately build 5 distinct transactional map implementations, each with their own tradeoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transactional Map Implementations
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Optimistic Transactional Map:&lt;/strong&gt; At first glance, the name of this map could imply that this map carries out optimistic reads, however, the transactional semantics of the map are quite different. &lt;br&gt;
Before I explain the map's semantics, this map's commit phase is split into two, validation and commit. During validation, the transaction tries to acquire all semantically meaningful locks for the transaction i.e. if a put operation for key, already has a value for that key in the map, the contains key nor size lock is acquired.&lt;br&gt;
This transactional map's semantics require readers eagerly stating their intent by acquiring read locks for their semantics at scheduled time(before commits). Writes however lazily state their intent and are serialized through one lock per key. This map promises &lt;strong&gt;READ COMMITTED&lt;/strong&gt; isolation levels across the entire map but &lt;strong&gt;SERIALIZED&lt;/strong&gt; isolation levels per key when writes are involved , meaning no dirty reads can ever happen on any key in the map, and repeatable or phantom reads are impossible per key.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pessimistic Transactional Map:&lt;/strong&gt; This transactional map's semantics require readers lazily stating their intent by incrementing an atomic counter at validation time. Writes however are serialized through one lock per key and lazily state their intent. This map also promises &lt;strong&gt;READ COMMITTED&lt;/strong&gt; isolation levels across the entire map but &lt;strong&gt;SERIALIZED&lt;/strong&gt; isolation levels per key &lt;br&gt;
&lt;strong&gt;NOTE:&lt;/strong&gt; For both optimistic and pessimistic maps, writers block readers and readers block writers, but readers never block each other.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Read Uncommitted Transactional Map:&lt;/strong&gt; This transactional map's semantics do not require readers stating their intent at all. Writes however are still serialized through one lock per key and lazily state their intent. This map promises &lt;strong&gt;READ UNCOMMITTED&lt;/strong&gt; isolation levels, meaning transactions can read data by transactions not fully committed yet i.e. Dirty reads. In the &lt;a href="https://people.csail.mit.edu/mcarbin/papers/ppopp07.pdf" rel="noopener noreferrer"&gt;paper&lt;/a&gt;, this is regarded as an &lt;strong&gt;open-nested&lt;/strong&gt; transaction.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copy On Write Transactional Map:&lt;/strong&gt; This transactional map's semantics can be linked closely to that of &lt;strong&gt;CopyOnWriteArrayList&lt;/strong&gt;. This copies the last seen map reference onto it's thread stack, if the map on the stack was modified, it tries to replace the shared map reference using CAS, if it fails, it recopies to map, modifies and tries to cas until it succeeds. This map promises &lt;strong&gt;READ COMMITTED&lt;/strong&gt; isolation levels across the entire map. Repeatable and phantom reads are possible if you read a value, another thread commits, and you read again, you might see the new value&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flat Combined Transactional Map:&lt;/strong&gt; I wanted to see how a fully serialized approach would compare, which led me to flat combining.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Flat Combining
&lt;/h3&gt;

&lt;p&gt;Flat combining is a synchronization semantic that proposes, rather than using fine-grained locking for critical sections, a &lt;strong&gt;coarse-grained lock&lt;/strong&gt; is used for all operations. Now, rather than each thread contesting for the lock on the data structure which would kill performance, each thread enqueues their proposed operation onto a queue as a node and stores their operation in a thread local variable, then spins or waits until a combiner processes their request, if no combiner is active, any thread can become one. The combiner, traverses the queued operations and performs those operations on behalf of those threads, then unlocks the lock. &lt;br&gt;
My explanation is a pretty simplified version of what flat combining actually entails and if you're interested in going into the actual details for flat combining, you can check it out &lt;a href="https://people.csail.mit.edu/shanir/publications/Flat%20Combining%20SPAA%2010.pdf" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;br&gt;
There are two possible combiners my transactional map could use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unbound Combiner:&lt;/strong&gt; This combiner implementation is faithful to what the paper describes, threads storing their intent in thread local variables, enqueuing its actions unto a shared publication queue and then becoming the combiner or waiting for the combiner to perform its operation of its behalf.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semaphore Combiner:&lt;/strong&gt; This combiner implementation cycles nodes across threads. Here, threads replace their thread local nodes with the current node at the head of the combiner publication queue, then sets that node as their tail. A combining node, then scans the queue, executes all actions on behalf of the waiting threads except the node at the tail of the queue, then informs the thread holding node at the tail of the queue that it is the next combiner. This combiner does not use a lock. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;To evaluate the properties of all my transactional maps, I ran two benchmarks: a disjoint key benchmark, where each thread operates exclusively on its own key (no key contention by design), and a contention benchmark, where all threads compete over a small fixed pool of 4 keys across three workload profiles — read heavy (90% get operations), balanced (50/50), and write heavy (90% put operations) both primarily focusing on throughput. I will only be focusing in this writeup on the numbers on read/write heavy portions of the contention benchmarks&lt;br&gt;
It is important to point out that combining operations are fully serialized, hence these benchmarks may not actually be measuring what we think they'll be measuring potentially yielding false results. So I won't focus too much on the flat combined numbers until later, and then I'll share how I plan on improving them and more importantly reducing their error margins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; These benchmarks were run using JMH(Java MicroHarness Benchmark)&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark Setup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;JMH version: 1.37&lt;/li&gt;
&lt;li&gt;JVM: Java 25, HotSpot 64-Bit Server VM (25+37-LTS-3491)&lt;/li&gt;
&lt;li&gt;Benchmark mode: Throughput (ops/s)&lt;/li&gt;
&lt;li&gt;Warmup: 5 iterations × 1s each&lt;/li&gt;
&lt;li&gt;Measurement: 5 iterations × 1s each&lt;/li&gt;
&lt;li&gt;Forks: 2&lt;/li&gt;
&lt;li&gt;Thread configuration: 1, 2 , 4, 8 (Platform threads)&lt;/li&gt;
&lt;li&gt;CPU Specs: Intel(R) Core(TM) i5-10300H CPU @ 2.50GHz (2.50 GHz), 8 cores&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Read Heavy
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D419529405%26format%3Dimage" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D419529405%26format%3Dimage" alt="Read Heavy - 1 Thread" width="600" height="371"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D1075497262%26format%3Dimage" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D1075497262%26format%3Dimage" alt="Read Heavy - 8 Threads" width="600" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Under contention, the copy on write implementation's performance, throughput grows from ~3.7M ops/s at 1 thread to 6.4M at 8 threads (+72%), since readers don't block each other at all. This is a known strength of copy on write implementations which sacrifice write throughput for high read throughput. &lt;/p&gt;

&lt;p&gt;The pessimistic transactional map's throughput drops from ~1.2M ops/s at 1 thread to ~865k at 8 threads. This is mainly due to the strong isolation guarantees this implementation provides, blocking readers while writes are active and vice versa. Also, context switches from multiple reader threads waking up after a writer has notified them contributes to this throughput drop as well&lt;/p&gt;

&lt;p&gt;And for the optimistic transactional map, the story is similar to that of pessimistic, throughput drops from ~1.27M ops/s at 1 thread to ~701k ops/s at 8 threads, though read operations scale worse than pessimistic due to similar problems, though rather than only reader threads performing OS context switches, writers too, now also park/unpark due to lock contention.&lt;/p&gt;

&lt;p&gt;Read uncommitted implementation's performance is more balanced with throughput growing from ~2.0M ops/s at 1 thread to ~3.1M at 8 threads compared to the others, though it offers much weaker isolation guarantees compared to the previous three.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write Heavy
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D1590911224%26format%3Dimage" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D1590911224%26format%3Dimage" alt="Write Heavy - 1 Thread" width="600" height="371"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D1945883913%26format%3Dimage" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vQCZMJbJvC9LBh-MxqySWYzYu_FaZ5z2hAX2eQyxyzRh_QEvyBJcoQrWem-sMDmwYjZzxN1XQjZcHtD%2Fpubchart%3Foid%3D1945883913%26format%3Dimage" alt="Write Heavy - 8 Threads" width="600" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Copy on Write's throughput drops ~60% from 1 thread 2.4M ops/s to 8 threads 941K ops/s.Unlike reads, write operations cause multiple CAS retries and the over head of recopying the shared map on each retry under high contention becomes a big bottleneck.&lt;/p&gt;

&lt;p&gt;The pessimistic transactional map's throughput drops ~60% from 1 thread ~927k ops/s to 8 threads ~388k ops/s. Since all writes per key are serialized under one lock to enforce serial writes and uphold its isolation guarantees, the overhead of multiple OS context switches due to lock contention becomes a primary bottleneck, as threads spend more time waiting to acquire locks than actually doing work&lt;/p&gt;

&lt;p&gt;While for the optimistic transactional map, the story is similar to that of pessimistic, throughput drops from ~883k ops/s at 1 thread to ~304k ops/s at 8 threads, though read operations scale worse than pessimistic due to similar problems, much more time is spent waiting to acquire locks than doing actual work.&lt;/p&gt;

&lt;p&gt;Read uncommitted implementation offers a much more interesting view as throughput only goes down ~18% from 1 thread ~1.36M ops/s to 8 threads at ~1.11M ops/s, offering the best write performance out of the three implementations, due to weak isolation guarantees.&lt;/p&gt;

&lt;p&gt;From these numbers we can see that no single implementation wins across all workloads, the right choice depends entirely on your read/write ratio and how much you're willing to trade isolation guarantees for throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Flat Combining
&lt;/h3&gt;

&lt;p&gt;Looking back at the flat combining numbers even though both have terrible variance for both read/write heavy workloads, I do want to offer what I plan to do to improve them.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Experimenting with multiple idle strategies. Right now, both implementations while idle, spin for &lt;strong&gt;n&lt;/strong&gt; number of times, before rechecking their results and potentially trying to become the combiner, this could lead to wasted work as their results could be ready much earlier or the combiner might have forfeited its status earlier as well. Two other strategies I'd implement and benchmark against are&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plain Busy Waiting:&lt;/strong&gt; Threads just spin on their results continuously without spinning &lt;strong&gt;n&lt;/strong&gt; number of times before checking their results or trying to acquire the lock. Though constant CAS &lt;code&gt;tryLock()&lt;/code&gt; failures could in the unbound combiner could ideally make throughput and its variance worse than just using a global lock&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Timed Parks:&lt;/strong&gt; This involves threads parking for a fixed amount of time before rechecking their result, though OS context switches might actually become a bottleneck here.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using atomic arrays instead of a publication linked queue: Right now, both combiners use a linked publication queue to share nodes for the combiner to process. Linked queues are susceptible to pointer chasing which could put pressure on the GC and JVM, rather we could use an &lt;strong&gt;AtomicReferenceArray&lt;/strong&gt; with a fixed threshold to store nodes, in which the combiner can traverse. However, this could leave nodes susceptible to false sharing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Increasing the threshold of the publication queue of both implementations: One of flat combining's strengths is its ability to batch apply requests for other waiting threads, increasing this amount while increasing wait time, could increase reduce the amount of serial handoffs needed for each combiner and less failed CAS operations to acquire the lock to become the combiner&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Further Experimentation
&lt;/h3&gt;

&lt;p&gt;We've compared multiple implementations of these transactional maps against each other in this writeup, here are some ways I plan to push my implementations further&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;MVCC Transactional Maps: MVCC(Multi Version Concurrency Control) is a very common technique used in relational databases providing &lt;strong&gt;READ-COMMITTED&lt;/strong&gt; isolation guarantees while ensuring readers and writers do not block each other&lt;/li&gt;
&lt;li&gt;Partitioned Flat Combined Transactional Maps: Rather than wrapping the whole transactional map in a combiner, we instead map each key and the &lt;strong&gt;size&lt;/strong&gt; value to a combiner, transactions submit their read/write requests per key and wait on a response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As I implement these, I'll be sure to document the performance/throughput of every experiment I perform here.&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/kusoroadeolu/tx-map" rel="noopener noreferrer"&gt;https://github.com/kusoroadeolu/tx-map&lt;/a&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>concurrency</category>
      <category>database</category>
      <category>datastructures</category>
    </item>
  </channel>
</rss>
