If you've built anything that streams LLM responses over SSE, you've probably hit this: the user refreshes the page, or their network blips, or the load balancer routes the reconnect to a different instance — and the stream is just gone. The generation keeps burning tokens on your backend, but the client sees nothing.
In the JS/TS world this is mostly solved. Vercel shipped resumable-stream, there's ai-resumable-stream, Ably has a whole token streaming product. But if your backend is in Go? Nothing.
I ran into this while working on a project where the LLM worker and the HTTP handler live in different processes. I needed something that:
- persists chunks so reconnecting clients can replay what they missed
- delivers cancel signals across instances (user clicks "stop" on one node, generation stops on another)
- prevents duplicate producers (two requests racing to start the same session)
So I built streamhub.
How it works
Two Redis primitives, that's it:
- Redis Streams store chunks. New subscribers read history first, then get live data.
- Redis Pub/Sub carries cancel signals. Fast, fire-and-forget.
Each producer gets a generation ID that acts as a fencing token — if a stale producer tries to write after losing ownership, the writes are rejected.
What the code looks like
Producer side:
stream, created, err := hub.Register("chat:123", func() {
// called when someone cancels this session
})
if !created {
return // another instance already owns this
}
defer stream.Close()
stream.Publish("hello")
stream.Publish(" world")
Consumer side (can be a completely different process):
stream := hub.Get("chat:123")
chunks, unsubscribe := stream.Subscribe(128)
defer unsubscribe()
for chunk := range chunks {
// replays existing chunks first, then streams live
fmt.Fprint(w, chunk)
w.(http.Flusher).Flush()
}
Cancel from anywhere:
hub.Get("chat:123").Cancel()
Why not just use X?
"Just use Redis Streams directly" — you can, but you'll end up reimplementing subscriber fan-out, replay-then-live handoff, generation fencing, and the cancel side-channel. That's what streamhub is.
"Use Centrifuge/Centrifugo" — great project, but it's a full real-time messaging framework. If all you need is to make your LLM streams durable, it's a lot of surface area.
"Use vercel/resumable-stream" — TypeScript only, tightly coupled to the Vercel AI SDK.
Status
Early days. The API surface might still change. If you're dealing with this same problem in Go, I'd appreciate feedback: github.com/gtoxlili/streamhub
Top comments (1)
Hey, Zak here. I work on the AI streaming product at Ably. Nice writeup and thanks for the callout! It's good to see LLM response streaming getting some proper attention rather than being treated as a solved problem. Looks like you've tackled a lot of the problems in streamhub already.
SSE is one of those transports that looks resumable on the surface (there's a
Last-Event-IDheader in the spec) but the AI project demos and examples almost universally skip over what actually needs to happen on the server for that header to mean anything. Most of the "here's how to stream from an LLM" tutorials stop at the point where tokens are sent to the browser, and quietly assume nothing ever goes wrong with the HTTP connection. The bits that get hard once you take resume seriously:And then there's the fact that SSE is unidirectional. So anything that needs to flow back to the producer (like steering messages mid-generation, or cancellation and interrupts) has to be sent on an out-of-band channel. Which is fine until you need to figure out which instance is actually running the generation you want to cancel, across an autoscaling group. The routing problem is a real problem, and a lot of folks resort to writing cancellation messages to the datastore, and polling for them. These are the problems we're solving in the Ably product, so I'd love to show you the Go SDKs we're working on if you're up for it.