DEV Community

RAXXO Studios
RAXXO Studios

Posted on • Originally published at raxxo.shop

Server-Sent Events Beat WebSockets for 80% of My AI Streaming UIs: 5 Patterns

  • SSE handles 80% of AI streaming UIs with one HTTP/2 connection and zero WebSocket plumbing

  • EventSource auto-reconnects in 3 seconds with no client retry logic

  • 5 patterns: chat token paint, agent task feeds, cron dashboards, image generation, dev hot reload

  • WebSockets still win for sub-50ms duplex, binary frames, or true bidirectional flows

  • EventSource adds 14 bytes overhead per message vs 2-6 for WS frames, irrelevant under 100 msg/sec

I shipped 14 AI streaming UIs across raxxo.shop, the Lab tools, and three client projects last quarter. Twelve of them use Server-Sent Events. Two use WebSockets. The split surprised me, because I started every one of them assuming I needed WebSockets.

LLM inference is one-way streaming. Tokens flow from server to browser. The browser does not interrupt. That is the textbook SSE use case, and yet every "build a Claude clone" tutorial reaches for WebSockets out of habit. Here is what I actually use, with the 5 patterns that cover most of my streaming work.

Why Server-Sent Events Beat WebSockets for AI Streaming

The full WebSocket dance is a connection upgrade, ping/pong heartbeats, manual reconnect logic, frame queueing, and a parallel auth path because cookies do not always travel through the upgrade. For LLM token streams, none of that earns its keep.

Server-Sent Events are plain HTTP. The browser opens a text/event-stream response, the server pushes lines, the connection stays open. EventSource handles reconnection in roughly 3 seconds with no client code. Cookies, CORS, and HTTP/2 multiplexing all work the way the rest of your stack already works.

Headers cost more. SSE messages carry a data: prefix and double newline, around 14 bytes per event. WebSocket binary frames cost 2 to 6 bytes. At 60 tokens per second from Claude, that 8 byte difference is 480 bytes per second. Not a problem. At 100,000 messages per second on a trading feed, it is. Pick the right tool for the throughput you actually have.

When WebSockets still win:

  1. Sub-50ms latency duplex (multiplayer games, voice rooms, collaborative cursors)

  2. Binary frames at scale (audio chunks, video, protobuf)

  3. True bidirectional flows where the client constantly pushes back

Every other streaming UI I have built fits SSE.

Pattern 1: Stream Claude API to Browser With EventSource

The Claude API streams via SSE already. You proxy that stream straight to the browser. No queueing, no message broker, no Redis pub/sub.


// Hono backend on Vercel
import { streamSSE } from 'hono/streaming'
import Anthropic from '@anthropic-ai/sdk'

app.post('/chat', async (c) => {
  const { prompt } = await c.req.json()
  return streamSSE(c, async (stream) => {
    const response = await anthropic.messages.stream({
      model: 'claude-opus-4-7',
      max_tokens: 4096,
      messages: [{ role: 'user', content: prompt }]
    })
    for await (const chunk of response) {
      if (chunk.type === 'content_block_delta') {
        await stream.writeSSE({ data: chunk.delta.text })
      }
    }
    await stream.writeSSE({ event: 'done', data: '' })
  })
})

Enter fullscreen mode Exit fullscreen mode

// React frontend
const es = new EventSource('/chat')
es.onmessage = (e) => setText(prev => prev + e.data)
es.addEventListener('done', () => es.close())

Enter fullscreen mode Exit fullscreen mode

That is the full streaming chat flow. 18 lines of server, 3 of client. I covered the broader Hono setup at Hono: The Tiny Framework That Runs My Entire Backend, which pairs perfectly with SSE because Hono's streamSSE helper handles all the framing.

A note on POST: standard EventSource only supports GET. For prompts longer than ~2KB I either send a session ID via GET, or use the @microsoft/fetch-event-source library which adds POST support without giving up auto-reconnect.

Pattern 2: Long-Running Agent Task Progress Feeds

Claude Code agents run multi-step plans. The user wants to see "Reading file 1 of 12", "Running tests", "Writing patch". Each step is a discrete event, not a token stream, but the shape is the same: server pushes, client paints.


app.get('/agent/run/:id', async (c) => {
  const id = c.req.param('id')
  return streamSSE(c, async (stream) => {
    const events = subscribe(`agent:${id}`)  // Postgres LISTEN, Redis sub, whatever
    for await (const evt of events) {
      await stream.writeSSE({
        event: evt.type,        // 'step', 'tool_call', 'error', 'done'
        data: JSON.stringify(evt),
        id: evt.seq             // for resume
      })
      if (evt.type === 'done') break
    }
  })
})

Enter fullscreen mode Exit fullscreen mode

The id field on each event is the killer feature. EventSource sends the last received ID back as Last-Event-ID after a reconnect. Your backend can replay missed events from a queue. The 1M context window matters for agents like this, because long agent runs can dump huge context payloads in their final event. I wrote about that at The 1M Context Window Actually Changes How I Code.

Real numbers from a deploy bot I shipped two weeks ago: 47 events per agent run on average, 3.2 seconds for the average reconnect to complete a full replay, zero events lost across 1,200 runs.

Pattern 3: Server-Side Cron Dashboards With SSE

Build status, deploy events, last-100 syndication results. The data updates every few seconds and every connected dashboard wants the same view. Classic pub/sub, but you do not need WS.


// Cron writes to a fanout channel
cron.schedule('*/30 * * * * *', async () => {
  const status = await checkBuilds()
  publish('builds:fanout', status)
})

// SSE endpoint subscribes per browser
app.get('/dashboard/builds', (c) =>
  streamSSE(c, async (stream) => {
    const sub = subscribe('builds:fanout')
    for await (const status of sub) {
      await stream.writeSSE({ data: JSON.stringify(status) })
    }
  })
)

Enter fullscreen mode Exit fullscreen mode

I run six dashboards on this exact pattern. Vercel deploys, Shopify product sync, blog syndication results, GitHub Actions, Cloudflare cache hit rate, and a custom one for raxxo.shop sales pulse. They all hit the same Redis fanout, each browser opens one SSE connection, and the server fans out without per-tab state.

This was 1,400 lines of WebSocket code in the previous job. It is now 80 lines of SSE.

Pattern 4: AI Image Generation Progress Without Polling

Image gen jobs take 8 to 40 seconds. The classic browser pattern is poll every 2 seconds, which is wasteful and laggy. SSE is much cleaner.


app.post('/imagine', async (c) => {
  const { prompt } = await c.req.json()
  const jobId = crypto.randomUUID()
  enqueueJob(jobId, prompt)
  return c.json({ jobId })
})

app.get('/imagine/:id/stream', (c) =>
  streamSSE(c, async (stream) => {
    const sub = subscribe(`job:${c.req.param('id')}`)
    for await (const evt of sub) {
      await stream.writeSSE({
        event: evt.phase,       // 'queued', 'progress', 'preview', 'done'
        data: JSON.stringify(evt)
      })
      if (evt.phase === 'done') break
    }
  })
)

Enter fullscreen mode Exit fullscreen mode

Phases I push: queued with queue position, progress every 5% with a percentage, preview with a low-res blurhash thumbnail at 30%, done with the final URL. The browser shows a progressive blur-to-sharp reveal that feels twice as fast as a polling spinner, even though the underlying generation time is identical.

Average connection time per job: 22 seconds. Average events received: 14. Average bytes per event: 180 (mostly preview thumbnails encoded as blurhash strings).

Pattern 5: Hot Reload And Dev Preview Events Without WebSocket Plumbing

Vite uses WebSockets for HMR because Vite needs bidirectional flow (the client tells the server what modules to invalidate first). For my own lighter dev tools, the pure server-push pattern is enough.

I built a preview server for the Lab blog drafts that watches the markdown folder, rebuilds on save, and tells every open preview tab to reload. Three lines of chokidar, one SSE endpoint, one EventSource on the client.


chokidar.watch('blog-drafts/').on('change', (path) => {
  publish('reload', { path, ts: Date.now() })
})

app.get('/dev/reload', (c) =>
  streamSSE(c, async (stream) => {
    const sub = subscribe('reload')
    for await (const evt of sub) {
      await stream.writeSSE({ data: JSON.stringify(evt) })
    }
  })
)

Enter fullscreen mode Exit fullscreen mode

new EventSource('/dev/reload').onmessage = () => location.reload()

Enter fullscreen mode Exit fullscreen mode

This replaced a 600-line WebSocket dev server I had been maintaining for two years. It does the same job in 12 lines. No reconnect logic, no ping/pong, no upgrade dance. If the dev server restarts, EventSource reconnects in 3 seconds and the next file save triggers a reload. The model context protocol world has the same shape: request, stream of events, done. I broke that down at MCP Servers Are How Claude Actually Talks to Everything.

Bottom Line

If your AI UI is one-way streaming (tokens, progress events, preview pushes), use Server-Sent Events. The code is shorter, the auth is simpler, the reconnect is automatic, and HTTP/2 makes the connection cost negligible. WebSockets earn their place when the client genuinely talks back at high frequency, when frames are binary and large, or when 50ms latency matters more than developer time.

I keep a running stack of these patterns in the RAXXO Blueprint, the same playbook I use to ship Lab tools each week. The next stream you build, try the EventSource version first. If the patterns above do not cover it, then reach for WebSockets with a clear reason. Most of the time, you will not need to.

Top comments (0)