<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sumedhvats</title>
    <description>The latest articles on DEV Community by Sumedhvats (@sumedhvats).</description>
    <link>https://dev.to/sumedhvats</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3017143%2Fb59d4173-ce6c-4f15-9642-50d8ff4c27aa.png</url>
      <title>DEV Community: Sumedhvats</title>
      <link>https://dev.to/sumedhvats</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sumedhvats"/>
    <language>en</language>
    <item>
      <title>Production-Ready Rate Limiter in Go: From Side Project to Distributed System</title>
      <dc:creator>Sumedhvats</dc:creator>
      <pubDate>Mon, 03 Nov 2025 09:59:11 +0000</pubDate>
      <link>https://dev.to/sumedhvats/production-ready-rate-limiter-in-go-from-side-project-to-distributed-system-1h3c</link>
      <guid>https://dev.to/sumedhvats/production-ready-rate-limiter-in-go-from-side-project-to-distributed-system-1h3c</guid>
      <description>&lt;h2&gt;
  
  
  A deep dive into three algorithms, atomic Redis operations, and building a high-performance, flexible library from scratch.
&lt;/h2&gt;

&lt;p&gt;When you're building a new service, rate limiting is one of those things you &lt;em&gt;know&lt;/em&gt; you need, but you often start with something simple. Maybe it's a basic in-memory counter. But what happens when your service grows? When you move from a single server to a distributed system, that simple counter breaks down. You're stuck rewriting your rate limiting logic.&lt;/p&gt;

&lt;p&gt;Most Go rate limiters I found forced me into a single algorithm (usually token bucket) or locked me into a specific storage backend. This was the problem I set out to solve.&lt;/p&gt;

&lt;p&gt;I decided to build &lt;strong&gt;&lt;a href="https://github.com/sumedhvats/rate-limiter-go" rel="noopener noreferrer"&gt;rate-limiter-go&lt;/a&gt;&lt;/strong&gt;, a library that scales with you. It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multiple battle-tested algorithms&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pluggable storage&lt;/strong&gt; (in-memory or Redis)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atomic Redis operations&lt;/strong&gt; for concurrency-safe, production-ready limiting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this post, I'm going to walk you through the journey of building it: the algorithms I explored, the edge cases I found, and the final high-performance library.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: The Quest for the "Perfect" Algorithm
&lt;/h2&gt;

&lt;p&gt;Rate limiting seems simple, but there are many ways to do it, each with critical trade-offs.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Naive Start: Fixed Window
&lt;/h3&gt;

&lt;p&gt;This is the most intuitive approach. You divide time into fixed "windows" (e.g., one minute) and allow a certain number of requests in that window.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mental Model:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set a limit (e.g., 100 requests per minute).&lt;/li&gt;
&lt;li&gt;If the time is &lt;code&gt;12:24:02&lt;/code&gt;, the window is &lt;code&gt;12:24:00&lt;/code&gt; to &lt;code&gt;12:24:59&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;All requests in this period increment a single counter.&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;counter &amp;gt; 100&lt;/code&gt;, reject.&lt;/li&gt;
&lt;li&gt;At &lt;code&gt;12:25:00&lt;/code&gt;, the counter resets to 0.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;The Problem: Burst Errors&lt;/strong&gt;&lt;br&gt;
This algorithm has a major flaw. Imagine your limit is 100 requests/minute. A user could send 100 requests at &lt;code&gt;12:24:59&lt;/code&gt; (which are allowed) and then &lt;em&gt;another&lt;/em&gt; 100 requests at &lt;code&gt;12:25:00&lt;/code&gt; (which are also allowed, as it's a new window).&lt;/p&gt;

&lt;p&gt;This user just sent &lt;strong&gt;200 requests in two seconds&lt;/strong&gt;, effectively doubling your intended rate limit and bypassing your protection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to Use It:&lt;/strong&gt; Simple, low-traffic, or single-node setups where absolute precision isn't critical.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. The "Smooth" Approach: Token Bucket
&lt;/h3&gt;

&lt;p&gt;This algorithm is a classic for a reason. It's designed to handle bursts gracefully while maintaining a steady average rate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mental Model:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Each user gets a "bucket" with a maximum capacity (e.g., 100 tokens).&lt;/li&gt;
&lt;li&gt;The bucket is refilled at a constant rate (e.g., 10 tokens per second).&lt;/li&gt;
&lt;li&gt;Every request tries to consume one token.&lt;/li&gt;
&lt;li&gt;If a token is available, the request is allowed.&lt;/li&gt;
&lt;li&gt;If the bucket is empty, the request is rejected.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is much better. It allows a user to "save up" tokens to send a short burst (up to the bucket capacity), but they can't exceed the steady-state refill rate over the long term.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Implementation Edge Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clock Skew:&lt;/strong&gt; In a distributed system, different servers will have different clocks, leading to inconsistent refill calculations. &lt;strong&gt;Solution:&lt;/strong&gt; Use Redis server time (&lt;code&gt;TIME&lt;/code&gt; command) as the single source of truth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Float Precision:&lt;/strong&gt; Refill rates are often fractional (e.g., 1.66 tokens/sec). This can lead to floating-point precision issues. &lt;strong&gt;Solution:&lt;/strong&gt; Be careful to round values before comparison.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to Use It:&lt;/strong&gt; This is ideal for most public APIs. It provides smooth flow control and allows for legitimate, short-term bursts of traffic.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3. The Balanced Approach: Sliding Window Counter
&lt;/h3&gt;

&lt;p&gt;This was the algorithm that struck the best balance for me. It solves the "burst error" of the Fixed Window but is simpler to implement and often more performant than a Token Bucket.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mental Model:&lt;/strong&gt;&lt;br&gt;
This algorithm smooths out the rate by considering a &lt;em&gt;weighted average&lt;/em&gt; of the &lt;strong&gt;previous&lt;/strong&gt; window and the &lt;strong&gt;current&lt;/strong&gt; window.&lt;/p&gt;

&lt;p&gt;Imagine a 1-minute window (limit 100).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's &lt;code&gt;12:25:15&lt;/code&gt; (so, we are 25% of the way through the current window).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Previous Window (&lt;code&gt;12:24&lt;/code&gt;):&lt;/strong&gt; Had 80 requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Current Window (&lt;code&gt;12:25&lt;/code&gt;):&lt;/strong&gt; Has 10 requests so far.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We don't just look at the &lt;code&gt;10&lt;/code&gt; requests. We calculate a weighted count:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weight of Previous Window: 75% (since 75% of the sliding window is still in the past)&lt;/li&gt;
&lt;li&gt;Weight of Current Window: 25%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Weighted Count = (80 requests * 75%) + (10 requests * 25%)&lt;/code&gt;&lt;br&gt;
&lt;code&gt;Weighted Count = 60 + 2.5 = 62.5&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The user's current effective count is 62.5. They can continue making requests. This approach gracefully "slides" the count from one window to the next, completely eliminating the boundary burst problem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;When to Use It:&lt;/strong&gt; My recommendation for most general-purpose, distributed rate limiting. It provides excellent accuracy and performance without the complexity of token management.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Part 2: From Theory to Production Library
&lt;/h2&gt;

&lt;p&gt;Knowing the algorithms is one thing; implementing them in a production-ready way is another. Here were my core design goals for &lt;strong&gt;&lt;a href="https://github.com/sumedhvats/rate-limiter-go" rel="noopener noreferrer"&gt;rate-limiter-go&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Storage Backend Abstraction
&lt;/h3&gt;

&lt;p&gt;I wanted to start with in-memory storage for development and scale to Redis in production &lt;em&gt;without changing my application code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I defined a simple &lt;code&gt;Storage&lt;/code&gt; interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Storage&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;
    &lt;span class="n"&gt;Increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, I can initialize my limiter with either backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Development: in-memory&lt;/span&gt;
&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewMemoryStorage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c"&gt;// Production: Redis (same interface)&lt;/span&gt;
&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRedisStorage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"redis-cluster:6379"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Same limiter code works with both&lt;/span&gt;
&lt;span class="n"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewSlidingWindowLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Atomic Redis Operations
&lt;/h3&gt;

&lt;p&gt;In a concurrent system, you can't just &lt;code&gt;GET&lt;/code&gt; a value, check it, and then &lt;code&gt;SET&lt;/code&gt; it. This is a classic race condition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# !! RACE CONDITION !!&lt;/span&gt;
&lt;span class="c"&gt;# Client 1 GETS count (99)&lt;/span&gt;
&lt;span class="c"&gt;# Client 2 GETS count (99)&lt;/span&gt;
&lt;span class="c"&gt;# Client 1 increments to 100, SETS 100. (Allowed)&lt;/span&gt;
&lt;span class="c"&gt;# Client 2 increments to 100, SETS 100. (Also Allowed)&lt;/span&gt;
&lt;span class="c"&gt;# !! We just allowed 101 requests !!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The solution is to perform all operations &lt;strong&gt;atomically&lt;/strong&gt;. I used &lt;strong&gt;Lua scripts&lt;/strong&gt;, which Redis guarantees will run without interruption.&lt;/p&gt;

&lt;p&gt;Here is the (simplified) Lua script for the Fixed Window algorithm. It gets, checks, increments, and sets the expiry all in one atomic step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Fixed Window example (simplified)&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;increment&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;-- Denied&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'INCRBY'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'EXPIRE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;-- Allowed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;No race conditions. No approximate counting. Just correctness.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. High-Performance In-Memory Storage
&lt;/h3&gt;

&lt;p&gt;For the in-memory backend, the obvious choice is a &lt;code&gt;sync.Mutex&lt;/code&gt; wrapping a &lt;code&gt;map[string]int&lt;/code&gt;. However, Go's documentation mentions &lt;code&gt;sync.Map&lt;/code&gt; is optimized for a specific case: "when a given key is written once but read many times."&lt;/p&gt;

&lt;p&gt;A rate limiter cache is the &lt;em&gt;opposite&lt;/em&gt;: keys are read and written to on almost &lt;em&gt;every request&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;My implementation for in-memory storage uses &lt;code&gt;sync.Map&lt;/code&gt; but leverages its &lt;code&gt;CompareAndSwap&lt;/code&gt; (CAS) atomic operations to safely increment counters under high concurrency, which performs better than a single, global mutex blocking all goroutines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: Putting It All Together
&lt;/h2&gt;

&lt;p&gt;Here's what the final library looks like in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Start: 5 Lines to Rate Limiting
&lt;/h3&gt;

&lt;p&gt;This is all it takes to add rate limiting to any function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/sumedhvats/rate-limiter-go/pkg/limiter"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/sumedhvats/rate-limiter-go/pkg/storage"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// 1. Create in-memory storage&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewMemoryStorage&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// 2. Create limiter: 10 requests per minute&lt;/span&gt;
    &lt;span class="n"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewSlidingWindowLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Rate&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Window&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Minute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c"&gt;// 3. Check if request is allowed&lt;/span&gt;
    &lt;span class="n"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;rateLimiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"user:alice"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// 4. Deny or allow&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;allowed&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Rate limit exceeded!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// 5. Allow&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Request allowed!"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Most Common Use Case: HTTP Middleware
&lt;/h3&gt;

&lt;p&gt;Of course, the most common need is for an HTTP API. I built a middleware that handles everything automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/sumedhvats/rate-limiter-go/middleware"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/sumedhvats/rate-limiter-go/pkg/limiter"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/sumedhvats/rate-limiter-go/pkg/storage"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// Use Redis for a distributed system&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewRedisStorage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"localhost:6379"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// 100 requests per minute per IP&lt;/span&gt;
    &lt;span class="n"&gt;rateLimiter&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewSlidingWindowLimiter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Rate&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="m"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Window&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Minute&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="c"&gt;// Apply middleware&lt;/span&gt;
    &lt;span class="n"&gt;mux&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServeMux&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mux&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandleFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataHandler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// The middleware automatically uses IP address as the key&lt;/span&gt;
    &lt;span class="n"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RateLimitMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Limiter&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rateLimiter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})(&lt;/span&gt;&lt;span class="n"&gt;mux&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ListenAndServe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;":8080"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;dataHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ResponseWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Data served successfully"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This middleware automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extracts the client IP (handling &lt;code&gt;X-Forwarded-For&lt;/code&gt; proxies).&lt;/li&gt;
&lt;li&gt;Returns a &lt;code&gt;429 Too Many Requests&lt;/code&gt; JSON error.&lt;/li&gt;
&lt;li&gt;Adds standard rate limit headers (&lt;code&gt;X-RateLimit-Limit&lt;/code&gt;, &lt;code&gt;X-RateLimit-Remaining&lt;/code&gt;, &lt;code&gt;X-RateLimit-Reset&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 4: The Proof: Does it Scale?
&lt;/h2&gt;

&lt;p&gt;I built this for performance, so I benchmarked it heavily. Here are the results on my 12th Gen Intel i5.&lt;/p&gt;

&lt;p&gt;This first test shows a realistic, concurrent load with many different keys (e.g., many different users hitting the API).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multiple Keys (Realistic Load) - Concurrent&lt;/strong&gt;&lt;br&gt;
|     Algorithm      |    Time/op   | Memory/op|&lt;br&gt;
|--------------------|--------------|----------|&lt;br&gt;
| &lt;strong&gt;Sliding Window&lt;/strong&gt; | &lt;strong&gt;68 ns/op&lt;/strong&gt; | 100 B/op |&lt;br&gt;
| &lt;strong&gt;Token Bucket&lt;/strong&gt;   | 76 ns/op     | 160 B/op |&lt;br&gt;
| &lt;strong&gt;Fixed Window&lt;/strong&gt;   | 130 ns/op    | 261 B/op |&lt;/p&gt;

&lt;p&gt;This test shows how the system scales when hammering the cache with 10,000 unique keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability (10K Keys) - Concurrent&lt;/strong&gt;&lt;br&gt;
|    Algorithm       |  Time/op |   Throughput    |&lt;br&gt;
|--------------------|----------|-----------------|&lt;br&gt;
| &lt;strong&gt;Token Bucket&lt;/strong&gt;   | 56 ns/op | ~17M ops/sec   |&lt;br&gt;
| &lt;strong&gt;Sliding Window&lt;/strong&gt; | 74 ns/op | ~13M ops/sec   |&lt;br&gt;
| &lt;strong&gt;Fixed Window&lt;/strong&gt;   | 95 ns/op | ~10M ops/sec   |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Insights:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sliding Window&lt;/strong&gt; and &lt;strong&gt;Token Bucket&lt;/strong&gt; are the clear winners, both able to handle &lt;strong&gt;13-17 million operations per second&lt;/strong&gt; on a single core.&lt;/li&gt;
&lt;li&gt;They are incredibly lightweight, using 100-160 bytes per operation.&lt;/li&gt;
&lt;li&gt;The performance scales linearly with the number of keys.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building this library was a fantastic journey through algorithm design, concurrency patterns in Go, and atomic database operations with Redis.&lt;/p&gt;

&lt;p&gt;I started with a simple goal: create a rate limiter that wouldn't need to be rewritten when a project scaled. The result is a library that lets you choose the right algorithm for the job, scales from a single in-memory instance to a distributed Redis cluster, and operates with atomic, concurrency-safe guarantees.&lt;/p&gt;

&lt;p&gt;If you want to check out the code, contribute, or use the library in your own project, you can find it on GitHub.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pkg.go.dev/github.com/sumedhvats/rate-limiter-go" rel="noopener noreferrer"&gt;Go reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://goreportcard.com/report/github.com/sumedhvats/rate-limiter-go" rel="noopener noreferrer"&gt;Go Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/sumedhvats/rate-limiter-go" rel="noopener noreferrer"&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>systemdesign</category>
      <category>algorithms</category>
      <category>opensource</category>
      <category>go</category>
    </item>
  </channel>
</rss>
