<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gtio</title>
    <description>The latest articles on DEV Community by Gtio (@gtoxlili).</description>
    <link>https://dev.to/gtoxlili</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3879044%2F71fe7f47-f683-4016-aae0-83bfcd48a6bf.jpeg</url>
      <title>DEV Community: Gtio</title>
      <link>https://dev.to/gtoxlili</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gtoxlili"/>
    <language>en</language>
    <item>
      <title>I needed resumable LLM streams in Go — so I built streamhub</title>
      <dc:creator>Gtio</dc:creator>
      <pubDate>Tue, 14 Apr 2026 17:10:55 +0000</pubDate>
      <link>https://dev.to/gtoxlili/i-needed-resumable-llm-streams-in-go-so-i-built-streamhub-349g</link>
      <guid>https://dev.to/gtoxlili/i-needed-resumable-llm-streams-in-go-so-i-built-streamhub-349g</guid>
      <description>&lt;p&gt;If you've built anything that streams LLM responses over SSE, you've probably hit this: the user refreshes the page, or their network blips, or the load balancer routes the reconnect to a different instance — and the stream is just gone. The generation keeps burning tokens on your backend, but the client sees nothing.&lt;/p&gt;

&lt;p&gt;In the JS/TS world this is mostly solved. Vercel shipped &lt;a href="https://github.com/vercel/resumable-stream" rel="noopener noreferrer"&gt;resumable-stream&lt;/a&gt;, there's &lt;a href="https://github.com/zirkelc/ai-resumable-stream" rel="noopener noreferrer"&gt;ai-resumable-stream&lt;/a&gt;, Ably has a whole &lt;a href="https://ably.com/blog/token-streaming-for-ai-ux" rel="noopener noreferrer"&gt;token streaming product&lt;/a&gt;. But if your backend is in Go? Nothing.&lt;/p&gt;

&lt;p&gt;I ran into this while working on a project where the LLM worker and the HTTP handler live in different processes. I needed something that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;persists chunks so reconnecting clients can replay what they missed&lt;/li&gt;
&lt;li&gt;delivers cancel signals across instances (user clicks "stop" on one node, generation stops on another)&lt;/li&gt;
&lt;li&gt;prevents duplicate producers (two requests racing to start the same session)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/gtoxlili/streamhub" rel="noopener noreferrer"&gt;streamhub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Two Redis primitives, that's it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redis Streams&lt;/strong&gt; store chunks. New subscribers read history first, then get live data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis Pub/Sub&lt;/strong&gt; carries cancel signals. Fast, fire-and-forget.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each producer gets a generation ID that acts as a fencing token — if a stale producer tries to write after losing ownership, the writes are rejected.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the code looks like
&lt;/h2&gt;

&lt;p&gt;Producer side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;hub&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"chat:123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// called when someone cancels this session&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="c"&gt;// another instance already owns this&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"hello"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" world"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consumer side (can be a completely different process):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;hub&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"chat:123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unsubscribe&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;128&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;unsubscribe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c"&gt;// replays existing chunks first, then streams live&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fprint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Flusher&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cancel from anywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;hub&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"chat:123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Cancel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why not just use X?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Just use Redis Streams directly"&lt;/strong&gt; — you can, but you'll end up reimplementing subscriber fan-out, replay-then-live handoff, generation fencing, and the cancel side-channel. That's what streamhub is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use Centrifuge/Centrifugo"&lt;/strong&gt; — great project, but it's a full real-time messaging framework. If all you need is to make your LLM streams durable, it's a lot of surface area.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use vercel/resumable-stream"&lt;/strong&gt; — TypeScript only, tightly coupled to the Vercel AI SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  Status
&lt;/h2&gt;

&lt;p&gt;Early days. The API surface might still change. If you're dealing with this same problem in Go, I'd appreciate feedback: &lt;a href="https://github.com/gtoxlili/streamhub" rel="noopener noreferrer"&gt;github.com/gtoxlili/streamhub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>redis</category>
      <category>ai</category>
      <category>streaming</category>
    </item>
  </channel>
</rss>
