<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: amogh tyagi</title>
    <description>The latest articles on DEV Community by amogh tyagi (@amogh_tyagi).</description>
    <link>https://dev.to/amogh_tyagi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3870596%2F0725c781-8fc9-4374-93ff-d77728b22d10.png</url>
      <title>DEV Community: amogh tyagi</title>
      <link>https://dev.to/amogh_tyagi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amogh_tyagi"/>
    <language>en</language>
    <item>
      <title>Rate Limiting at Scale: Building Fixed Window and Token Bucket in Go</title>
      <dc:creator>amogh tyagi</dc:creator>
      <pubDate>Fri, 10 Apr 2026 03:51:26 +0000</pubDate>
      <link>https://dev.to/amogh_tyagi/rate-limiting-at-scale-building-fixed-window-and-token-bucket-in-go-3m7i</link>
      <guid>https://dev.to/amogh_tyagi/rate-limiting-at-scale-building-fixed-window-and-token-bucket-in-go-3m7i</guid>
      <description>&lt;p&gt;Rate limiting is one of those things every backend engineer knows they need but few actually build from scratch. Most reach for a library. I built mine — two algorithms, Redis-backed, with Lua scripting for atomicity. Here's what the tradeoffs actually look like when you're writing the implementation instead of just configuring it.&lt;/p&gt;

&lt;p&gt;Why build it instead of using a library?&lt;/p&gt;

&lt;p&gt;Mostly to understand what the library is doing. Rate limiting looks simple until you think about concurrent requests hitting the same counter at the same millisecond. That's where the interesting problems live — and a library abstracts all of that away from you.&lt;/p&gt;

&lt;p&gt;I implemented two algorithms: fixed window and token bucket. They solve the same problem differently, and the difference matters depending on your traffic pattern.&lt;/p&gt;

&lt;p&gt;Fixed window&lt;/p&gt;

&lt;p&gt;The simplest mental model: you get N requests per time window. Window resets, counter resets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fw&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;FixedWindow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;windowSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;fw&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem with naive fixed window is the boundary attack. If your window resets every minute, a client can send 100 requests at 11:59 and another 100 at 12:00 — 200 requests in two seconds, double your intended limit. This is a well-known flaw, and it's why sliding window exists. Fixed window is fast and simple, but you need to know what you're trading off.&lt;/p&gt;

&lt;p&gt;Token bucket&lt;/p&gt;

&lt;p&gt;Token bucket is more nuanced. You have a bucket with a maximum capacity. Tokens refill at a constant rate. Each request consumes a token. If the bucket is empty, the request is rejected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tb&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;TokenBucket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Allow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Unix&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;tb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;refillRate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This handles bursts gracefully. A client that's been idle accumulates tokens up to the bucket capacity, then can burst at full speed until empty. It's a better model for real API traffic — users aren't perfectly uniformly distributed.&lt;/p&gt;

&lt;p&gt;The complexity cost: you need to track both token count and last refill timestamp, and compute the refill delta on every request. That's two reads and a write per check, which is where atomicity becomes critical.&lt;/p&gt;

&lt;p&gt;The Redis + Lua atomicity problem&lt;/p&gt;

&lt;p&gt;Here's the race condition that bites you if you're not careful. With token bucket:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read current token count
Compute new count based on elapsed time
Write new count back
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If two requests hit simultaneously, both read the same token count, both compute independently, and both write — one of the writes gets lost. You've now allowed more requests than you should have.&lt;/p&gt;

&lt;p&gt;The fix is making the read-compute-write a single atomic operation. Redis supports this via Lua scripts, which execute atomically on the Redis server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;refill_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'HMGET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tokens'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'last_refill'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;capacity&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;last_refill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tonumber&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;

&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last_refill&lt;/span&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;new_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;math.min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;refill_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;new_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'HMSET'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tokens'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_tokens&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'last_refill'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No locks. No transactions. The entire check-and-decrement happens in one Redis round trip, atomically. This is the right way to do distributed rate limiting.&lt;/p&gt;

&lt;p&gt;Fixed window Lua is simpler&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;
&lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'INCR'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'EXPIRE'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;KEYS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;INCR is already atomic in Redis, but wrapping it with the EXPIRE logic in Lua ensures the TTL gets set exactly once on the first request — no race between increment and expire.&lt;/p&gt;

&lt;p&gt;CI with GitHub Actions&lt;/p&gt;

&lt;p&gt;Every push runs the test suite automatically. The workflow is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CI&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;6379:6379&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-go@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;go-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1.21'&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;go test ./...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key part is spinning up a real Redis instance in the CI environment as a service container. Testing against a real Redis rather than a mock means your Lua scripts actually get executed and validated — mocks won't catch scripting errors.&lt;/p&gt;

&lt;p&gt;Fixed window vs token bucket — when to use which&lt;br&gt;
    Fixed window    Token bucket&lt;br&gt;
Implementation  Simple  More complex&lt;br&gt;
Burst handling  Poor    Good&lt;br&gt;
Boundary vulnerability  Yes No&lt;br&gt;
Redis ops per request   1   1 (via Lua)&lt;br&gt;
Best for    Internal services, simple APIs  Public APIs, user-facing rate limits&lt;/p&gt;

&lt;p&gt;For most public APIs, token bucket is the right default. Fixed window is fine for internal service-to-service limits where you control both sides and traffic is predictable.&lt;/p&gt;

&lt;p&gt;What I'd add next&lt;/p&gt;

&lt;p&gt;Sliding window log — the theoretically correct algorithm that tracks individual request timestamps. More memory-intensive than either of these, but eliminates the boundary problem of fixed window without the refill complexity of token bucket. Also a sliding window counter, which approximates it cheaply using two fixed windows.&lt;/p&gt;

&lt;p&gt;The full source&lt;/p&gt;

&lt;p&gt;Both algorithms, Redis store, Lua scripts, and CI config are on GitHub. The code is designed to be readable — if you're implementing your own, the Lua scripts are the part worth studying.&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>backend</category>
      <category>go</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Building a Production WebSocket Chat Server in Go — What I Learned</title>
      <dc:creator>amogh tyagi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 22:02:43 +0000</pubDate>
      <link>https://dev.to/amogh_tyagi/building-a-production-websocket-chat-server-in-go-what-i-learned-5969</link>
      <guid>https://dev.to/amogh_tyagi/building-a-production-websocket-chat-server-in-go-what-i-learned-5969</guid>
      <description>&lt;p&gt;When I decided to build WireRoom, I had two goals: learn how real deployment works end to end, and understand WebSockets beyond the "it's like HTTP but persistent" explanation everyone gives.&lt;br&gt;
Here's what I actually ran into.&lt;/p&gt;

&lt;p&gt;What WireRoom does&lt;br&gt;
Users sign in — via Google, GitHub, or a plain username/password — and get dropped into shareable rooms with short alphanumeric codes. The first person in becomes the host, can kick participants, and can transfer host privileges. Messages are real-time. The whole thing runs on Go + WebSockets in production.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkh1c1g85sttoj7l7otd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxkh1c1g85sttoj7l7otd.png" alt="WireRoom login screen showing OAuth options for Google and GitHub alongside username/password auth" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why Go for a chat server?&lt;br&gt;
Go's concurrency model is a natural fit for WebSocket servers. Each connection is a long-lived, stateful thing — you need to read from it, write to it, and track it. Goroutines make this straightforward: spin one up per client, let the runtime handle scheduling. Compare this to Node.js where you're managing an event loop and callback chains the moment things get complex.&lt;br&gt;
I used Gorilla WebSocket — the de facto Go library for this. It wraps the upgrade handshake cleanly and gives you a Conn type you can read and write on directly.&lt;/p&gt;

&lt;p&gt;The architecture: goroutine per client&lt;br&gt;
Every client gets two goroutines — one for reading, one for writing. The reader blocks on conn.ReadMessage() and forwards messages to a central hub. The writer blocks on a channel and flushes messages out as they arrive.&lt;br&gt;
gofunc (c *Client) readPump() {&lt;br&gt;
    defer func() {&lt;br&gt;
        c.hub.unregister &amp;lt;- c&lt;br&gt;
        c.conn.Close()&lt;br&gt;
    }()&lt;br&gt;
    for {&lt;br&gt;
        _, message, err := c.conn.ReadMessage()&lt;br&gt;
        if err != nil {&lt;br&gt;
            break&lt;br&gt;
        }&lt;br&gt;
        c.hub.broadcast &amp;lt;- message&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
The hub is a single goroutine that owns all shared state — the client map, room assignments, host tracking, everything. All mutations go through it via channels. This is the design decision that kept the code race-free without fighting mutexes everywhere.&lt;/p&gt;

&lt;p&gt;The host system&lt;br&gt;
This was the most interesting backend problem. WireRoom has room ownership — first user in becomes host, with the ability to kick others or transfer the crown. That means the state isn't just "who's connected" but "who owns this room" and "what can they do."&lt;br&gt;
The hub handles this entirely. A kick event isn't just closing a connection — it's updating the host map, broadcasting a system message to the room, and gracefully closing the target client's goroutines in the right order.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs32psh2a2k0tw9yxdmbt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs32psh2a2k0tw9yxdmbt.png" alt="Participants panel showing two users, host crown icon, and kick/make host dropdown menu" width="364" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;System events like joins, leaves, and host transfers get broadcast to the room as distinct message types so the frontend can render them differently from chat messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx586ghkkkn2e4mldxvb9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx586ghkkkn2e4mldxvb9.png" alt="WireRoom chat room with live messages, system events for join and host assignment, and participant sidebar" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Auth: OAuth plus passwords&lt;br&gt;
I implemented two auth paths — OAuth via Google and GitHub, and a plain username/password fallback. The OAuth flow uses a state token for CSRF protection: generate a random token, store it in a cookie, send it to the provider, verify it matches on the redirect back. Without this, an attacker can trick a user into completing an OAuth flow the attacker initiated, linking the wrong account.&lt;br&gt;
The password path stores credentials against Supabase PostgreSQL and validates on login. Both paths converge at the same session — once you're in, the WebSocket layer doesn't care how you authenticated.&lt;/p&gt;

&lt;p&gt;Deployment: Railway over Render&lt;br&gt;
Render's free tier spins down services after inactivity. For a WebSocket server that's a dealbreaker — the first user after idle gets a cold start, and persistent connections can't survive a process restart. Railway keeps services alive and deploys straight from GitHub. Supabase handled PostgreSQL with connection pooling, so I didn't have to think about database connections under concurrent load.&lt;/p&gt;

&lt;p&gt;What I'd add next&lt;br&gt;
Emoji reactions. The infrastructure handles it — adding a new message type to the hub is trivial. It's just not shipped yet.&lt;/p&gt;

&lt;p&gt;What building this taught me&lt;br&gt;
The Go race detector (go test -race) is not optional. Use it from day one. The goroutine-per-client model with a central hub absorbed everything I threw at it — dozens of concurrent connections, rapid room switching, abrupt disconnects, host transfers mid-session. None of it caused issues once the hub ownership model was right.&lt;br&gt;
The full source is on GitHub.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>go</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
