Shiyam

Posted on May 29

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

#go #architecture #showdev #performance

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

When it comes to building an API traffic simulator or a load-testing tool, the hardest problem isn’t sending the HTTP requests—it’s measuring them.

Most developers reach for traditional tools like JMeter (which uses heavy OS threads and consumes massive memory) or write scripts in interpreted languages like Python or JavaScript (Locust, k6) which introduce their own performance overheads.

My primary motivation for building an open-source tool like Gopher-Glide (gg) was simple: I wanted something incredibly lightweight, easy to use, and capable of running standard .http files straight from my IDE.

But simplicity shouldn't come at the cost of power. I wanted to see if I could build a tool this simple that could still match or exceed the raw performance of industry-standard tools like k6, hey, or Locust.

To achieve that kind of scale, I had to build a custom execution core in Go. I call it the Hive Engine. Here is how I used a pure-Go Actor Model and lock-free atomics to hit 0 allocs/op on the hot path.

The Problem: Mutex Contention and GC Pauses

In Go, it’s trivially easy to spin up 10,000 goroutines to fire off HTTP requests:

for i := 0; i < 10000; i++ {
    go sendRequest(client, req)
}

The problem arises when those 10,000 goroutines all need to report their metrics (latency, status codes, bytes transferred) back to a central state to display on a live terminal UI.

If you use a sync.Mutex to protect a shared metrics map, your 10,000 goroutines will spend 90% of their CPU time waiting in line to acquire the lock. This contention destroys throughput.

If you allocate new metric objects on the heap for every request and pass them through Go channels, the Garbage Collector (GC) will eventually panic, trigger a Stop-The-World pause, and completely ruin your latency percentiles (P99).

The Solution: The Actor Model

To solve this, I designed the Hive Engine using a lightweight implementation of the Actor Model.

In the Hive Engine, there is no shared memory. Instead, the architecture is split into three isolated tiers:

The Queen: The central director. It reads your traffic profile (e.g., ramping up to 5,000 RPS) and calculates exactly how many requests need to be dispatched every millisecond.
The Hatchery: The distributor. It receives micro-batches of work from the Queen and assigns them to available workers.
The Worker Bees (Actors): Isolated goroutines holding persistent, keep-alive HTTP connections.

By ensuring that each virtual client runs in its own isolated goroutine, we avoid all the traditional scheduling bottlenecks. The OS doesn't have to context-switch heavy threads, and the Go runtime handles the network I/O multiplexing natively.

The Secret Sauce: Lock-Free Atomics (`0 allocs/op`)

So how do the Worker Bees report their metrics without locking or triggering the GC?

Sharded, lock-free atomics.

Instead of creating a new metric struct on the heap for every request, the Hive Engine allocates a fixed-size, pre-warmed array of metric buckets when the simulation starts.

When an Actor finishes an HTTP request, it doesn't acquire a mutex. Instead, it uses sync/atomic to perform a lock-free hardware-level AddUint64 operation directly onto its assigned shard.

// Increment the request count without a lock, avoiding GC entirely
atomic.AddUint64(&metricsShard.TotalRequests, 1)
atomic.AddUint64(&metricsShard.TotalBytes, uint64(bytesRead))

Because these counters are pre-allocated and updated via hardware atomics, the hot path generates exactly 0 allocs/op. The Garbage Collector literally has nothing to clean up.

Every 100ms, the UI simply sweeps over these integer counters to calculate the live RPS and latency distributions.

The Result: Gopher-Glide

By combining the Actor Model with lock-free atomics, the Hive Engine comfortably pushes 30,000+ RPS per core, scaling linearly to ~89,000+ RPS on standard multi-core developer hardware.

If you want to see this engine in action - see https://gopherglide.dev

Instead of writing JS or Python scripts, gg lets you test your APIs using the exact same .http files you already use in your IDE.

# Run your existing API requests under heavy load, instantly
$ gg --hive-engine --profile flash-sale --http-file api.http

Try it out!

If you're interested in the code, or just need a wildly fast API simulator, check out the repository:
👉 Gopher-Glide on GitHub
👉 Full Documentation & Benchmarks

I’d love to hear how the engine handles your local workloads, and if you have any feedback on the Go actor implementation! Drop a star if you find it useful. ⭐

DEV Community

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

The Problem: Mutex Contention and GC Pauses

The Solution: The Actor Model

The Secret Sauce: Lock-Free Atomics (`0 allocs/op`)

The Result: Gopher-Glide

Try it out!

Top comments (0)

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

The Problem: Mutex Contention and GC Pauses

The Solution: The Actor Model

The Secret Sauce: Lock-Free Atomics (0 allocs/op)

The Result: Gopher-Glide

Try it out!

The Secret Sauce: Lock-Free Atomics (`0 allocs/op`)