DEV Community: Shiyam

Why tracking "frequent lately" breaks standard patterns

Shiyam — Wed, 15 Jul 2026 14:21:18 +0000

If you've ever had to build high-scale rate limiting, telemetry sampling, or hot-key detection, you've likely run into a very specific, painful problem.

The requirement always boils down to one question: "How often has this thing been happening lately?"

And the system must answer this question in nanoseconds, for millions of distinct things, using a fixed amount of memory, accessed by many threads concurrently, forever.

To understand why this is so hard to build, let's look at a real-world analogy.

The Bouncer Analogy

Imagine you run the door at an enormous, very busy club. Thousands of faces walk up to the door every minute. Your job is to spot the regulars of the last hour. You don't care about the person who came every night in 2019 and stopped, and you don't care about the person who just showed up once tonight. You only care about who is showing up a lot, recently.

Now, add the engineering constraints:

You cannot write anything down per person. There are millions of possible faces, but your notepad is one single page, fixed size, bought once. (Fixed Memory)
You must answer instantly. The line cannot stop while you think. (Nanosecond Latency)
Memory must fade on its own. Last hour's regular who stopped coming must automatically stop counting as a regular. You don't get a nightly "erase the notepad" break, because the club never closes. (Intrinsic Decay)
Several of you work the door at once. You can't huddle to compare notes between guests. (Lock-free Concurrency)

If you look closely, this is actually three questions fused into one:

How often? (Frequency)
How recently? (Freshness / Decay)
Have I seen this one in the current shift yet? (First-sighting diversity)

Why the obvious solutions fail

When we try to solve this in software, we usually reach for standard tools. But under extreme scale, they all break down:

1. "Just use a hashmap of counters" (`map[key]count`)

Memory is unbounded. Every new key (new IP, new log template) grows the map.
No notion of time. To fix this, you have to bolt on timestamps and run a background cleanup sweep. Now your cleanup thread is racing your writer threads, which means you need locks, which kills your latency.

2. "Count in fixed windows and reset"

The window boundary is a cliff. If you wipe the counters every minute, a key that fired 10,000 times at 11:59:59 looks completely new at 12:00:00.
Global pauses. Wiping the data is a global mutation that fights concurrent writers.

3. "Use a probabilistic sketch (Count-Min Sketch)"

This gets us close—it gives us fixed memory and fast counts! But:

CMS has no clock. Counts only ever grow. "Frequent lately" is not a question it can answer natively. You have to bolt on a background decay loop, which brings back the synchronization nightmares.
It errs in the dangerous direction. Hash collisions in a CMS only ever inflate a count. If you use this for rate limiting, inflating a rare key's count means you accidentally punish a well-behaved user.

The Pattern

If you look at the standard answers, they all share the same structural challenge: Frequency and recency are treated as two separate facts requiring two separate systems.

Every standard approach ends up maintaining "time" as a separate mechanism from "counting" (via sweeps, window rotations, or halving loops). And the synchronization between the two is exactly where the memory blow-ups, the pauses, and the race conditions come from.

What's next?

I've been obsessing over this problem space recently. Is it possible to design a mechanism where recency is an intrinsic property of reading the state—requiring no background mutation, no reset pause, and no cleanup sweep—all while remaining lock-free?

I'm currently in the experimentation phase of designing a new primitive to solve this exact intersection of constraints. It's too early to share the exact mechanics before the math proves out in simulation, but I'll be posting my findings and (hopefully) some open-source code as I progress.

Have you come across such problems in your own systems? How did you solve them? Did you have to make a tough memory vs. speed tradeoff? Let me know in the comments!

From a Go CLI to a full developer ecosystem: Gopher Glide for IDEs

Shiyam — Thu, 02 Jul 2026 18:00:23 +0000

If you build backend systems, you probably test your APIs locally using standard .http files right inside your editor. It’s fast, native, and frictionless.

But what happens when you need to know if that same endpoint will survive a massive traffic spike?

Historically, this required a brutal context switch. You had to leave your IDE, boot up a heavy tool like JMeter or k6, and manually rewrite the exact same request from scratch using JavaScript, Python, or XML.

I built Gopher-Glide (gg) to kill that redundancy.

The vision was simple: What if your existing .http files were all you needed to stress-test your architecture? What if load testing wasn't a separate phase of development, but a native extension of the code editor you already live in?

Today, I’m incredibly excited to announce that this vision is a reality. With the latest milestone release—the official launch of the VS Code and Open VSX extensions—Gopher-Glide has officially evolved from a standalone CLI tool into a complete Developer Ecosystem.

The Journey to an Ecosystem
When the core Gopher-Glide engine was first released, the focus was purely on extreme performance. It was built as a lock-free Actor Model in Go that achieves 0 allocs/op on the hot path. This allowed a standard developer laptop to blast 30,000+ RPS without garbage collection pauses destroying the latency percentiles.

But it quickly became apparent that raw speed isn’t enough. Developer Experience (DX) is what actually drives productivity.

So, the ecosystem began to expand. First, a native JetBrains plugin was launched. Today, the official VS Code extension is being released. And because it is also published to the Open VSX Registry, Gopher-Glide now runs natively inside next-gen AI editors like Cursor and VSCodium.

The Workflow: Unprecedented Productivity
By placing the gg engine directly at the heart of the editor, the load-testing workflow fundamentally changes:

Write: Write your API request in a simple .http file.
Execute: Highlight the request in VS Code or JetBrains, and click run.
Visualize: Gopher-Glide opens a beautiful Native UI panel right inside the editor, visualizing the traffic in real-time.
Validate: Standard tools only tell you if an API is slow. Gopher-Glide natively diffs your JSON payloads under load to tell you if the API silently started returning empty arrays when the database got overwhelmed.
Zero context switching. Zero new scripting languages to learn. You never leave your editor.

What this opens up for the future
By tightly coupling a high-performance Go engine with the editor environment, this opens doors that traditional load testing tools simply can't access.

Because gg integrates directly with tools like Cursor, it steps into a future where AI can dynamically generate edge-case payload mutations in .http files, which are immediately executed at scale. Because the core engine remains a standalone binary, the exact same .http files used in the editor today can be executed in CI/CD pipelines tomorrow to catch schema regressions before they merge.

Gopher-Glide is no longer just a traffic generator; it is the heart of a unified API testing ecosystem.

Try it out today
Gopher-Glide is 100% free, open-source, and requires no cloud subscriptions or SaaS accounts. It destroys your servers, not your RAM.

🌐 Explore the documentation: https://gopherglide.dev
💻 VS Code: https://marketplace.visualstudio.com/items?itemName=gopherglide.gg-plugin
💻 Open VSX: https://open-vsx.org/extension/gopherglide/gg-plugin
🚀 JetBrains: https://plugins.jetbrains.com/plugin/30983-gopher-glide

If this workflow resonates with you, I would love to hear your feedback in the comments, or feel free to drop a star on the GitHub repo.

Let's crash some servers! 🚀

Simulating API Traffic Shouldn't Break Your Flow: Bringing Gopher-Glide Natively to JetBrains IDEs

Shiyam — Sun, 21 Jun 2026 16:03:02 +0000

If your API development loop is anything like mine, it looks something like this: write code, restart server, send a quick cURL or use the IDE’s HTTP client to make sure it returns a 200 OK.

But what happens when you need to know if that endpoint will survive a sudden spike of 500 requests per second?

Historically, simulating real-world traffic meant breaking your flow. You’d have to open a terminal, write a JMX file, or craft a custom script, and then watch a wall of text scroll by. The friction is high, which is why so many developers put off benchmarking until right before production.

I wanted to fix that. That’s why I built the Gopher-Glide (gg) CLI, and today, I’m thrilled to announce a massive architectural revamp of the Gopher-Glide JetBrains Plugin.

🛠 The Problem with Embedded Terminals

In earlier versions, the plugin simply launched the gg CLI’s interactive Terminal UI inside the JetBrains terminal widget. While it worked, TUIs rely on rapid ANSI escape code redraws. Running a heavy simulation at 24 frames-per-second inside the IDE's terminal caused massive CPU spikes and sometimes even froze the editor.

So, I completely overhauled the architecture.

✨ Enter the Native Dashboard

Instead of embedding a terminal, the plugin now drives Gopher-Glide under the hood in headless mode, piping JSON metrics directly into a 100% native JetBrains Tool Window.

Now, when you run a simulation, you get a beautiful, smooth dashboard docked at the bottom of your screen. It features real-time RPS charts, latency percentiles (p50/p95/p99), and a stage progression timeline. Because it uses native Swing components instead of a terminal redraw loop, the CPU overhead is practically zero.

⚡ Zero-Config Execution

I wanted the barrier to entry to be completely non-existent. You don't even need to write a configuration file to use it.
Got a standard .http file where you test your routes?

Click the green "Run GG" gutter icon next to your request.
A native popup appears with 21 built-in traffic profiles (e.g., E-Commerce Wave, Chaos/DDoS, Auto-Scaling Spikes).
Hit run.

That’s it. You are benchmarking your API.

📸 Catch Regressions Before They Merge

Benchmarking is useless if you don't remember the baseline. The new plugin introduces a native Snaps Panel to manage your performance data right next to your code.

Record: Tick a box when running a test to record a snapshot of the performance.
Compare: Select two historical snapshots in the tool window and diff them to see exactly what changed.
Assert: Run an assertion between two snapshots directly in the IDE to see if your latest code changes violated your latency or error-rate thresholds.

Want to automate it? I added a "Generate CI Workflow" right-click action that drops a pre-configured GitHub Actions YAML file into your project. It will automatically run your simulations, assert against the main branch, and drop a performance report as a PR comment.

Give it a Spin!

I've spent the last few weeks polishing this to feel like a first-class feature of the JetBrains ecosystem. If you use IntelliJ IDEA, GoLand, WebStorm, or PyCharm, I'd love for you to try it out.

You can search for "Gopher Glide" in the JetBrains Marketplace, or read more about it on the website: gopherglide.dev.

Drop any feedback, feature requests, or bugs in the comments below. Happy simulating! 🚀

Beyond Brute Force: Adaptive Backpressure in API Traffic Simulation

Shiyam — Thu, 11 Jun 2026 15:03:35 +0000

If you've ever used a traditional load testing tool like k6, JMeter, or Locust, you've probably experienced the "Wall of Red."

You point your tool at a staging server, dial the concurrency up to simulate a major traffic spike, and suddenly your terminal is flooded with connection timed out and socket: too many open files errors. The load tester reports an 80% failure rate, and you conclude that your server can't handle the traffic.

But what if the server wasn't the only thing failing? What if your load testing tool was fundamentally misrepresenting reality by forcing the server into a catastrophic deadlock that wouldn't actually happen in production?

That is exactly why I built Gopher-Glide (gg). It is an open-source, pure-Go API traffic simulator (gopherglide.dev) designed to solve this exact problem.

In this post, I'll explain the architectural flaw shared by most modern load testers (The Closed Model), and show you how I used Mathematical Adaptive Backpressure to build an engine that extracts 3x more successful requests from a saturated server while using 40% less memory than k6.

The Problem: The "Closed Model" Brute Force

Most popular load testing tools operate on a Closed Model. To simulate 10,000 concurrent users, they spin up 10,000 independent "Virtual Users" (VUs) — usually backed by embedded JavaScript Virtual Machines or heavy OS threads.

When you ask a Closed Model tool to push 30,000 Requests Per Second (RPS), it blindly loops those VUs as fast as it can. But what happens when the target server (e.g., your NGINX proxy) hits its physical limit and begins to queue connections?

Latency spikes. The server takes 500ms to respond instead of 10ms.
The VUs get blocked. Because the VUs are stuck waiting for the slow server, the load tester isn't hitting its 30,000 RPS target.
The tool panics and spawns more. To try and hit the target RPS, the tool furiously spawns even more concurrent connections.
Catastrophic Deadlock. The server, already drowning in queued connections, is slammed with thousands of new ones. It completely locks up, dropping everything.

The load tester reports a 75% timeout rate. But in reality, an intelligent production edge-proxy (like Cloudflare or an API Gateway) would have gracefully shed the excess load, allowing the server to process at least some traffic successfully. The load tester didn't simulate reality; it simulated a DDoS attack.

The Solution: The Open Model & Adaptive Backpressure

I designed Gopher-Glide to act as a true Open Model load generator.

Instead of heavy Virtual Users, gg uses an asynchronous Actor Model built on Go's ultra-lightweight Goroutines. It completely decouples the generation of traffic from the waiting of responses.

But the real magic is how gg protects the target server using Adaptive Backpressure.

As gg pushes traffic, a lock-free metrics subsystem continuously calculates the P50 response latency. If the server begins to slow down, gg mathematically calculates exactly how many concurrent connections the server can physically handle. If the required concurrency crosses the physical threshold of the network, gg instantly engages a "Smooth Trim."

Instead of blindly opening thousands of dead-end sockets and forcing the target server into a total deadlock, gg gracefully throttles the excess traffic locally within the engine itself.

The "Mic Drop" Benchmark: gg vs. k6

To prove this architecture, I ran a saturation benchmark. I pointed both Gopher-Glide and Grafana k6 at a local NGINX server, and asked both tools to push an impossible 30,000 RPS for 30 seconds (attempting ~900,000 total requests).

Both engines correctly identified the physical limit of the target server: over 30 seconds, the NGINX server was physically only capable of accepting around 92,000 network connections.

But the outcomes of those 92,000 connections were vastly different.

🧠 Goodput Extraction

Metric	Gopher Glide (`gg`)	`k6`
Total Requests Sent	92,059	92,184
Successful Responses	76,140	25,753
Failure Rate	17.29%	72.06%

When k6 hit the server's limit, its Closed Model panicked and just kept violently spawning virtual users. It forced the NGINX server into a total deadlock where 72% of the connections timed out or were refused.

When gg detected the server slowing down, its Adaptive Backpressure instantly engaged. Because it stopped slamming the network with useless dead-end connections, the NGINX server was actually able to breathe. gg extracted 3x MORE successful responses from the exact same struggling server, out of the exact same 92k connection budget.

⚡ Memory Efficiency

Engine	Peak Memory (RAM)	Efficiency
Gopher Glide (`gg`)	1.42 GB	40% less RAM required.
`k6`	2.38 GB	Heavy JavaScript VM bloat.

Because k6 had to spin up thousands of heavy Goja JavaScript VMs to maintain its blocked Virtual Users, its memory ballooned to 2.38 GB.

Gopher-Glide simply parked its lightweight Goroutines and throttled the excess load locally, capping out at a completely stable 1.42 GB.

Stop testing load. Start simulating reality.

When building a high-traffic system, the goal isn't to see how quickly you can crash your server. The goal is to see how your architecture behaves under stress.

By natively mimicking the graceful load-shedding behavior of an intelligent edge proxy, Gopher-Glide ensures that your CI/CD runner is dedicated to maximizing successful Goodput, rather than fighting a JavaScript VM's garbage collector.

If you want to run high-fidelity API traffic simulations using nothing but the standard .http REST Client files already sitting in your IDE, check out the links below:

🌐 Website & Documentation: gopherglide.dev
💻 GitHub Repository: shyam-s00/gopher-glide

No JavaScript. No Python. No YAML configs. Just mathematically sound, pure-Go concurrency.

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

Shiyam — Fri, 29 May 2026 14:32:42 +0000

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

When it comes to building an API traffic simulator or a load-testing tool, the hardest problem isn’t sending the HTTP requests—it’s measuring them.

Most developers reach for traditional tools like JMeter (which uses heavy OS threads and consumes massive memory) or write scripts in interpreted languages like Python or JavaScript (Locust, k6) which introduce their own performance overheads.

My primary motivation for building an open-source tool like Gopher-Glide (gg) was simple: I wanted something incredibly lightweight, easy to use, and capable of running standard .http files straight from my IDE.

But simplicity shouldn't come at the cost of power. I wanted to see if I could build a tool this simple that could still match or exceed the raw performance of industry-standard tools like k6, hey, or Locust.

To achieve that kind of scale, I had to build a custom execution core in Go. I call it the Hive Engine. Here is how I used a pure-Go Actor Model and lock-free atomics to hit 0 allocs/op on the hot path.

The Problem: Mutex Contention and GC Pauses

In Go, it’s trivially easy to spin up 10,000 goroutines to fire off HTTP requests:

for i := 0; i < 10000; i++ {
    go sendRequest(client, req)
}

The problem arises when those 10,000 goroutines all need to report their metrics (latency, status codes, bytes transferred) back to a central state to display on a live terminal UI.

If you use a sync.Mutex to protect a shared metrics map, your 10,000 goroutines will spend 90% of their CPU time waiting in line to acquire the lock. This contention destroys throughput.

If you allocate new metric objects on the heap for every request and pass them through Go channels, the Garbage Collector (GC) will eventually panic, trigger a Stop-The-World pause, and completely ruin your latency percentiles (P99).

The Solution: The Actor Model

To solve this, I designed the Hive Engine using a lightweight implementation of the Actor Model.

In the Hive Engine, there is no shared memory. Instead, the architecture is split into three isolated tiers:

The Queen: The central director. It reads your traffic profile (e.g., ramping up to 5,000 RPS) and calculates exactly how many requests need to be dispatched every millisecond.
The Hatchery: The distributor. It receives micro-batches of work from the Queen and assigns them to available workers.
The Worker Bees (Actors): Isolated goroutines holding persistent, keep-alive HTTP connections.

By ensuring that each virtual client runs in its own isolated goroutine, we avoid all the traditional scheduling bottlenecks. The OS doesn't have to context-switch heavy threads, and the Go runtime handles the network I/O multiplexing natively.

The Secret Sauce: Lock-Free Atomics (`0 allocs/op`)

So how do the Worker Bees report their metrics without locking or triggering the GC?

Sharded, lock-free atomics.

Instead of creating a new metric struct on the heap for every request, the Hive Engine allocates a fixed-size, pre-warmed array of metric buckets when the simulation starts.

When an Actor finishes an HTTP request, it doesn't acquire a mutex. Instead, it uses sync/atomic to perform a lock-free hardware-level AddUint64 operation directly onto its assigned shard.

// Increment the request count without a lock, avoiding GC entirely
atomic.AddUint64(&metricsShard.TotalRequests, 1)
atomic.AddUint64(&metricsShard.TotalBytes, uint64(bytesRead))

Because these counters are pre-allocated and updated via hardware atomics, the hot path generates exactly 0 allocs/op. The Garbage Collector literally has nothing to clean up.

Every 100ms, the UI simply sweeps over these integer counters to calculate the live RPS and latency distributions.

The Result: Gopher-Glide

By combining the Actor Model with lock-free atomics, the Hive Engine comfortably pushes 30,000+ RPS per core, scaling linearly to ~89,000+ RPS on standard multi-core developer hardware.

If you want to see this engine in action - see https://gopherglide.dev

Instead of writing JS or Python scripts, gg lets you test your APIs using the exact same .http files you already use in your IDE.

# Run your existing API requests under heavy load, instantly
$ gg --hive-engine --profile flash-sale --http-file api.http

Try it out!

If you're interested in the code, or just need a wildly fast API simulator, check out the repository:
👉 Gopher-Glide on GitHub
👉 Full Documentation & Benchmarks

I’d love to hear how the engine handles your local workloads, and if you have any feedback on the Go actor implementation! Drop a star if you find it useful. ⭐

DEV Community: Shiyam

Why tracking "frequent lately" breaks standard patterns

The Bouncer Analogy

Why the obvious solutions fail

1. "Just use a hashmap of counters" (map[key]count)

2. "Count in fixed windows and reset"

3. "Use a probabilistic sketch (Count-Min Sketch)"

The Pattern

What's next?

From a Go CLI to a full developer ecosystem: Gopher Glide for IDEs

Simulating API Traffic Shouldn't Break Your Flow: Bringing Gopher-Glide Natively to JetBrains IDEs

🛠 The Problem with Embedded Terminals

✨ Enter the Native Dashboard

⚡ Zero-Config Execution

📸 Catch Regressions Before They Merge

Give it a Spin!

Beyond Brute Force: Adaptive Backpressure in API Traffic Simulation

The Problem: The "Closed Model" Brute Force

The Solution: The Open Model & Adaptive Backpressure

The "Mic Drop" Benchmark: gg vs. k6

🧠 Goodput Extraction

⚡ Memory Efficiency

Stop testing load. Start simulating reality.

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

How I Built a Lock-Free Actor Model in Go to Hit 30k+ RPS (Zero Allocs)

The Problem: Mutex Contention and GC Pauses

The Solution: The Actor Model

The Secret Sauce: Lock-Free Atomics (0 allocs/op)

The Result: Gopher-Glide

Try it out!

1. "Just use a hashmap of counters" (`map[key]count`)

The Secret Sauce: Lock-Free Atomics (`0 allocs/op`)