<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shiyam</title>
    <description>The latest articles on DEV Community by Shiyam (@shyam-s00).</description>
    <link>https://dev.to/shyam-s00</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3957496%2Fa31635b4-e53e-451c-9f05-37bf8d66fb0b.jpg</url>
      <title>DEV Community: Shiyam</title>
      <link>https://dev.to/shyam-s00</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shyam-s00"/>
    <language>en</language>
    <item>
      <title>How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)</title>
      <dc:creator>Shiyam</dc:creator>
      <pubDate>Fri, 29 May 2026 14:32:42 +0000</pubDate>
      <link>https://dev.to/shyam-s00/how-i-built-a-lock-free-actor-model-in-go-to-hit-30k-rps-zero-allocs-4d2a</link>
      <guid>https://dev.to/shyam-s00/how-i-built-a-lock-free-actor-model-in-go-to-hit-30k-rps-zero-allocs-4d2a</guid>
      <description>&lt;h2&gt;
  
  
  How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)
&lt;/h2&gt;

&lt;p&gt;When it comes to building an API traffic simulator or a load-testing tool, the hardest problem isn’t sending the HTTP requests—it’s measuring them.&lt;/p&gt;

&lt;p&gt;Most developers reach for traditional tools like JMeter (which uses heavy OS threads and consumes massive memory) or write scripts in interpreted languages like Python or JavaScript (Locust, k6) which introduce their own performance overheads. &lt;/p&gt;

&lt;p&gt;My primary motivation for building an open-source tool like &lt;strong&gt;Gopher-Glide (&lt;code&gt;gg&lt;/code&gt;)&lt;/strong&gt; was simple: I wanted something incredibly lightweight, easy to use, and capable of running standard &lt;code&gt;.http&lt;/code&gt; files straight from my IDE. &lt;/p&gt;

&lt;p&gt;But simplicity shouldn't come at the cost of power. I wanted to see if I could build a tool this simple that could still match or exceed the raw performance of industry-standard tools like &lt;code&gt;k6&lt;/code&gt;, &lt;code&gt;hey&lt;/code&gt;, or &lt;code&gt;Locust&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To achieve that kind of scale, I had to build a custom execution core in Go. I call it the &lt;strong&gt;Hive Engine&lt;/strong&gt;. Here is how I used a pure-Go Actor Model and lock-free atomics to hit &lt;code&gt;0 allocs/op&lt;/code&gt; on the hot path.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Mutex Contention and GC Pauses
&lt;/h2&gt;

&lt;p&gt;In Go, it’s trivially easy to spin up 10,000 goroutines to fire off HTTP requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;go&lt;/span&gt; &lt;span class="n"&gt;sendRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem arises when those 10,000 goroutines all need to report their metrics (latency, status codes, bytes transferred) back to a central state to display on a live terminal UI.&lt;/p&gt;

&lt;p&gt;If you use a &lt;code&gt;sync.Mutex&lt;/code&gt; to protect a shared metrics map, your 10,000 goroutines will spend 90% of their CPU time waiting in line to acquire the lock. This contention destroys throughput.&lt;/p&gt;

&lt;p&gt;If you allocate new metric objects on the heap for every request and pass them through Go channels, the Garbage Collector (GC) will eventually panic, trigger a Stop-The-World pause, and completely ruin your latency percentiles (P99).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: The Actor Model
&lt;/h2&gt;

&lt;p&gt;To solve this, I designed the Hive Engine using a lightweight implementation of the &lt;strong&gt;Actor Model&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;In the Hive Engine, there is no shared memory. Instead, the architecture is split into three isolated tiers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Queen:&lt;/strong&gt; The central director. It reads your traffic profile (e.g., ramping up to 5,000 RPS) and calculates exactly how many requests need to be dispatched every millisecond.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Hatchery:&lt;/strong&gt; The distributor. It receives micro-batches of work from the Queen and assigns them to available workers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Worker Bees (Actors):&lt;/strong&gt; Isolated goroutines holding persistent, keep-alive HTTP connections.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By ensuring that each virtual client runs in its own isolated goroutine, we avoid all the traditional scheduling bottlenecks. The OS doesn't have to context-switch heavy threads, and the Go runtime handles the network I/O multiplexing natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Secret Sauce: Lock-Free Atomics (&lt;code&gt;0 allocs/op&lt;/code&gt;)
&lt;/h2&gt;

&lt;p&gt;So how do the Worker Bees report their metrics without locking or triggering the GC? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sharded, lock-free atomics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of creating a new metric struct on the heap for every request, the Hive Engine allocates a fixed-size, pre-warmed array of metric buckets when the simulation starts.&lt;/p&gt;

&lt;p&gt;When an Actor finishes an HTTP request, it doesn't acquire a mutex. Instead, it uses &lt;code&gt;sync/atomic&lt;/code&gt; to perform a lock-free hardware-level &lt;code&gt;AddUint64&lt;/code&gt; operation directly onto its assigned shard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Increment the request count without a lock, avoiding GC entirely&lt;/span&gt;
&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddUint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;metricsShard&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TotalRequests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AddUint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;metricsShard&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TotalBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bytesRead&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because these counters are pre-allocated and updated via hardware atomics, the hot path generates exactly &lt;strong&gt;&lt;code&gt;0 allocs/op&lt;/code&gt;&lt;/strong&gt;. The Garbage Collector literally has nothing to clean up. &lt;/p&gt;

&lt;p&gt;Every 100ms, the UI simply sweeps over these integer counters to calculate the live RPS and latency distributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result: Gopher-Glide
&lt;/h2&gt;

&lt;p&gt;By combining the Actor Model with lock-free atomics, the Hive Engine comfortably pushes &lt;strong&gt;30,000+ RPS per core&lt;/strong&gt;, scaling linearly to &lt;strong&gt;~89,000+ RPS&lt;/strong&gt; on standard multi-core developer hardware.&lt;/p&gt;

&lt;p&gt;If you want to see this engine in action - see &lt;a href="https://gopherglide.dev" rel="noopener noreferrer"&gt;https://gopherglide.dev&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Instead of writing JS or Python scripts, &lt;code&gt;gg&lt;/code&gt; lets you test your APIs using the exact same &lt;code&gt;.http&lt;/code&gt; files you already use in your IDE.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run your existing API requests under heavy load, instantly&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;gg &lt;span class="nt"&gt;--hive-engine&lt;/span&gt; &lt;span class="nt"&gt;--profile&lt;/span&gt; flash-sale &lt;span class="nt"&gt;--http-file&lt;/span&gt; api.http
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Try it out!
&lt;/h3&gt;

&lt;p&gt;If you're interested in the code, or just need a wildly fast API simulator, check out the repository:&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/shyam-s00/gopher-glide" rel="noopener noreferrer"&gt;Gopher-Glide on GitHub&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://gopherglide.dev" rel="noopener noreferrer"&gt;Full Documentation &amp;amp; Benchmarks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I’d love to hear how the engine handles your local workloads, and if you have any feedback on the Go actor implementation! Drop a star if you find it useful. ⭐&lt;/p&gt;

</description>
      <category>go</category>
      <category>architecture</category>
      <category>showdev</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
