devleo

Posted on May 9

I built a 20 kB React hook that doesn't care which AI you use — here's how streaming actually works

#react #ai #typescript #webdev

`---

Most React AI chat libraries are secretly backend libraries.

They stream directly from OpenAI, or through their own cloud, or via a framework-specific server
adapter. The React hook is just a thin client on top of one particular provider. Switch from Claude
to GPT-4? Rewrite the frontend. Migrate off Vercel? Rewrite the frontend. Add Groq for a faster path?
Rewrite the frontend.

But here's the thing: streaming AI chat is fundamentally just three events:

data: {"type":"text","text":"Hello"}
data: {"type":"text","text":", world"}
data: {"type":"done"}

That's it. text, done, error. Your React component shouldn't need to know anything more than that.

So I built react-ai-stream (https://github.com/trimooo/react-ai-stream) — a backend-agnostic
streaming hook that speaks this protocol. Any server that produces those three events works,
regardless of which LLM is behind it.

The architecture

Here's the full picture:

React UI /
│
useAIChat() hook
useSyncExternalStore
│
Zustand store SSE parser + normalizer
messages · loading · error ReadableStream → StreamChunk
│ │
└───────────────────────────────┘
│
HTTP POST + SSE stream
│
Your server (/api/chat)
Next.js · Express · FastAPI · Go · Rails
│
Anthropic · OpenAI · Groq · Custom

The boundary in the middle is everything. The React layer speaks {type, text} over SSE. The server
speaks whatever the LLM provider requires. Neither knows about the other's implementation.

How streaming actually works

Most tutorials skip the networking part. Here's what's actually happening.

Server-Sent Events (SSE) is a one-directional HTTP protocol: the server opens a connection and keeps
sending data:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache

data: {"type":"text","text":"Hello"}

data: {"type":"text","text":", world"}

data: {"type":"done"}

The double newline (\n\n) is the event delimiter. Your API route receives the user's messages, calls
the LLM, and re-emits tokens in this format.

The buffering problem nobody talks about

Here's where most implementations have a subtle bug. Network chunks don't align with SSE event
boundaries. One reader.read() call might return half an event. The next call might return three
events and the beginning of a fourth.

The correct pattern:

let buf = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buf += decoder.decode(value, { stream: true })
const parts = buf.split('\n\n')
buf = parts.pop() ?? '' // ← preserve the incomplete tail
for (const part of parts) {
// process complete events
}
}

The critical invariant: buf = parts.pop() keeps the incomplete trailing event. If you write buf = ''
inside the loop (I've seen this in production code), you silently drop buffered content. No error.
The message just ends mid-sentence sometimes.

10 lines to a streaming chat

'use client'
import { useAIChat } from '@react-ai-stream/react'
import { Chat } from '@react-ai-stream/ui'
import '@react-ai-stream/ui/styles'

export default function Page() {
const { messages, sendMessage, loading, stop } = useAIChat({
endpoint: '/api/chat', // any streaming endpoint
})
return
}

The hook has no dependency on the UI package. You can wire messages to any component — Tailwind,
shadcn/ui, a floating widget, a sidebar panel. is opt-in.

Why "backend-agnostic" is the right abstraction

Compare these two approaches:

Coupled approach — OpenAI SDK in the browser:

// Your LLM choice is now in your bundle.
// Your API key is exposed.
// Switching providers requires a frontend deploy.
import OpenAI from 'openai'
const client = new OpenAI({ apiKey: process.env.NEXT_PUBLIC_KEY, dangerouslyAllowBrowser: true })

Decoupled approach — hook speaks HTTP:

// The frontend doesn't know or care what's behind this endpoint.
// It could be GPT-4 today, Claude tomorrow, a local Llama next week.
const chat = useAIChat({ endpoint: '/api/chat' })

The server-side API route handles provider selection. It might route to Anthropic by default, fall
back to Groq during an outage, and serve EU traffic to a region-compliant endpoint — all without
touching the React component.

This also means you can run three providers simultaneously in complete isolation:

const claude = useAIChat({ endpoint: '/api/chat?provider=anthropic' })
const gpt = useAIChat({ endpoint: '/api/chat?provider=openai' })
const groq = useAIChat({ endpoint: '/api/chat?provider=groq' })

Each instance has its own message history, loading state, and abort controller. No shared context
required.

The React rendering challenge

The naive implementation of streaming into React state has a real performance problem:

// This fires a state update — and a re-render — for every token.
// At 50 tokens/second, that's 50 re-renders/second.
setResponse(prev => prev + token)

React 18 batches some updates, but async loop callbacks aren't always batched. During fast streaming
you can get tens of renders per second from a single useAIChat call.

The library solves this by using Zustand's createStore (the vanilla, framework-agnostic version)
combined with useSyncExternalStore:

// The store lives outside React.
// It mutates at whatever rate tokens arrive.
// useSyncExternalStore decides when React re-renders.
const storeRef = useRef(createStore())
const state = useSyncExternalStore(
storeRef.current.subscribe,
storeRef.current.getState
)

The mutation rate and the render rate are decoupled. The store can receive 100 tokens/second while
React batches updates efficiently.

This also enables true isolation. Each useAIChat() call creates its own store instance via a ref.
Three hook calls → three completely independent stores → three isolated chat instances. No
wrapping needed, no cross-component re-renders.

How abort propagates end-to-end

The stop button works through a chain of signals most people don't trace all the way:

user clicks Stop
→ abortController.abort()
→ fetch rejects (AbortError)
→ stream loop catches isAbortError() — true
→ loading → false, no error surfaced
→ partial response preserved in messages

On the server side, req.signal reflects this abort too. Forwarding it to the upstream LLM call
cancels token generation before it completes:

const upstream = await fetch(LLM_API_URL, {
signal: req.signal, // ← the user stopping the stream cancels the LLM call
body: JSON.stringify({ messages, stream: true }),
})

That's waste reduction at the infrastructure level, not just UI polish.

What's in the library

Three packages, all MIT, ~20 kB total:

Package: @react-ai-stream/core
What it does: SSE parser, chunk normalizer, Zustand store factory, abort utils — no React dep
────────────────────────────────────────
Package: @react-ai-stream/react
What it does: useAIChat hook, AIChatProvider context
────────────────────────────────────────
Package: @react-ai-stream/ui
What it does: , , with syntax highlighting

Built with: TypeScript strict mode, tsup (ESM + CJS), Vitest (34 tests), Turborepo monorepo.

Try it

npm install @react-ai-stream/react

Live demo (https://react-ai-stream-example.vercel.app) — three models streaming in parallel via Groq
Docs (https://react-ai-stream-docs.vercel.app) — quickstart, provider setup, API reference
GitHub (https://github.com/trimooo/react-ai-stream) — source, examples, architecture deep-dive

The architecture page (https://react-ai-stream-docs.vercel.app/architecture) and How streaming works
(https://react-ai-stream-docs.vercel.app/concepts/streaming-explained) have the full technical detail
if you want to go deeper.

What I'd like to hear

If you've built AI chat in React, I'm curious: what was the hardest part? Provider coupling,
streaming reliability, render performance, something else? The answer will probably shape what this
library focuses on next.

---`

Top comments (7)

Syed Ahmer Shah • May 10

Deeply appreciate the focus on the buffering logic—preserving the incomplete tail is such a crucial, often-missed detail. Great work on the decoupling!

devleo • May 10

Thanks — really glad the buffering section resonated. The incomplete-tail invariant is one of those tiny details that quietly breaks streaming if you get it wrong.

Syed Ahmer Shah • May 10

Decoupling the stream from the provider is the right move. Using Zustand to handle those high-frequency token updates is a smart fix for re-renders.

devleo • May 10

Appreciate that. The goal was to keep the React layer as stable as possible while the stream updates independently — Zustand ended up fitting that model really well.

Perfect Little Business • May 14

This is interesting

leo • May 14

informative information

Some comments have been hidden by the post's author - find out more