<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: devleo</title>
    <description>The latest articles on DEV Community by devleo (@trimooo).</description>
    <link>https://dev.to/trimooo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3922573%2Fc00e6f64-aa6d-4b8f-b8a7-1df15404cd51.jpeg</url>
      <title>DEV Community: devleo</title>
      <link>https://dev.to/trimooo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/trimooo"/>
    <language>en</language>
    <item>
      <title>I built a 20 kB React hook that doesn't care which AI you use — here's how streaming actually works</title>
      <dc:creator>devleo</dc:creator>
      <pubDate>Sat, 09 May 2026 23:43:09 +0000</pubDate>
      <link>https://dev.to/trimooo/i-built-a-20-kb-react-hook-that-doesnt-care-which-ai-you-use-heres-how-streaming-actually-works-432g</link>
      <guid>https://dev.to/trimooo/i-built-a-20-kb-react-hook-that-doesnt-care-which-ai-you-use-heres-how-streaming-actually-works-432g</guid>
      <description>&lt;p&gt;`---&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhswprxyhhoa76gyn4q3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhswprxyhhoa76gyn4q3.png" alt="&amp;lt;br&amp;gt;
cover image" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Most React AI chat libraries are secretly backend libraries.&lt;/p&gt;

&lt;p&gt;They stream directly from OpenAI, or through their own cloud, or via a framework-specific server&lt;br&gt;
  adapter. The React hook is just a thin client on top of one particular provider. Switch from Claude&lt;br&gt;
  to GPT-4? Rewrite the frontend. Migrate off Vercel? Rewrite the frontend. Add Groq for a faster path?&lt;br&gt;
   Rewrite the frontend.&lt;/p&gt;

&lt;p&gt;But here's the thing: streaming AI chat is fundamentally just three events:&lt;/p&gt;

&lt;p&gt;data: {"type":"text","text":"Hello"}&lt;br&gt;
  data: {"type":"text","text":", world"}&lt;br&gt;
  data: {"type":"done"}&lt;/p&gt;

&lt;p&gt;That's it. text, done, error. Your React component shouldn't need to know anything more than that.&lt;/p&gt;

&lt;p&gt;So I built react-ai-stream (&lt;a href="https://github.com/trimooo/react-ai-stream" rel="noopener noreferrer"&gt;https://github.com/trimooo/react-ai-stream&lt;/a&gt;) — a backend-agnostic&lt;br&gt;
  streaming hook that speaks this protocol. Any server that produces those three events works,&lt;br&gt;
  regardless of which LLM is behind it.&lt;/p&gt;




&lt;p&gt;The architecture&lt;/p&gt;

&lt;p&gt;Here's the full picture:&lt;/p&gt;

&lt;p&gt;React UI / &lt;br&gt;
      │&lt;br&gt;
  useAIChat() hook&lt;br&gt;
  useSyncExternalStore&lt;br&gt;
      │&lt;br&gt;
  Zustand store              SSE parser + normalizer&lt;br&gt;
  messages · loading · error  ReadableStream → StreamChunk&lt;br&gt;
      │                              │&lt;br&gt;
      └───────────────────────────────┘&lt;br&gt;
                      │&lt;br&gt;
                HTTP POST + SSE stream&lt;br&gt;
                      │&lt;br&gt;
            Your server (/api/chat)&lt;br&gt;
       Next.js · Express · FastAPI · Go · Rails&lt;br&gt;
                      │&lt;br&gt;
          Anthropic · OpenAI · Groq · Custom&lt;/p&gt;

&lt;p&gt;The boundary in the middle is everything. The React layer speaks {type, text} over SSE. The server&lt;br&gt;
  speaks whatever the LLM provider requires. Neither knows about the other's implementation.&lt;/p&gt;




&lt;p&gt;How streaming actually works&lt;/p&gt;

&lt;p&gt;Most tutorials skip the networking part. Here's what's actually happening.&lt;/p&gt;

&lt;p&gt;Server-Sent Events (SSE) is a one-directional HTTP protocol: the server opens a connection and keeps&lt;br&gt;
  sending data:&lt;/p&gt;

&lt;p&gt;HTTP/1.1 200 OK&lt;br&gt;
  Content-Type: text/event-stream&lt;br&gt;
  Cache-Control: no-cache&lt;/p&gt;

&lt;p&gt;data: {"type":"text","text":"Hello"}&lt;/p&gt;

&lt;p&gt;data: {"type":"text","text":", world"}&lt;/p&gt;

&lt;p&gt;data: {"type":"done"}&lt;/p&gt;

&lt;p&gt;The double newline (\n\n) is the event delimiter. Your API route receives the user's messages, calls&lt;br&gt;
  the LLM, and re-emits tokens in this format.&lt;/p&gt;

&lt;p&gt;The buffering problem nobody talks about&lt;/p&gt;

&lt;p&gt;Here's where most implementations have a subtle bug. Network chunks don't align with SSE event&lt;br&gt;
  boundaries. One reader.read() call might return half an event. The next call might return three&lt;br&gt;
  events and the beginning of a fourth.&lt;/p&gt;

&lt;p&gt;The correct pattern:&lt;/p&gt;

&lt;p&gt;let buf = ''&lt;br&gt;
  while (true) {&lt;br&gt;
    const { done, value } = await reader.read()&lt;br&gt;
    if (done) break&lt;br&gt;
    buf += decoder.decode(value, { stream: true })&lt;br&gt;
    const parts = buf.split('\n\n')&lt;br&gt;
    buf = parts.pop() ?? ''   // ← preserve the incomplete tail&lt;br&gt;
    for (const part of parts) {&lt;br&gt;
      // process complete events&lt;br&gt;
    }&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;The critical invariant: buf = parts.pop() keeps the incomplete trailing event. If you write buf = ''&lt;br&gt;
  inside the loop (I've seen this in production code), you silently drop buffered content. No error.&lt;br&gt;
  The message just ends mid-sentence sometimes.&lt;/p&gt;




&lt;p&gt;10 lines to a streaming chat&lt;/p&gt;

&lt;p&gt;'use client'&lt;br&gt;
  import { useAIChat } from '@react-ai-stream/react'&lt;br&gt;
  import { Chat } from '@react-ai-stream/ui'&lt;br&gt;
  import '@react-ai-stream/ui/styles'&lt;/p&gt;

&lt;p&gt;export default function Page() {&lt;br&gt;
    const { messages, sendMessage, loading, stop } = useAIChat({&lt;br&gt;
      endpoint: '/api/chat',   // any streaming endpoint&lt;br&gt;
    })&lt;br&gt;
    return &lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;The hook has no dependency on the UI package. You can wire messages to any component — Tailwind,&lt;br&gt;
  shadcn/ui, a floating widget, a sidebar panel.  is opt-in.&lt;/p&gt;




&lt;p&gt;Why "backend-agnostic" is the right abstraction&lt;/p&gt;

&lt;p&gt;Compare these two approaches:&lt;/p&gt;

&lt;p&gt;Coupled approach — OpenAI SDK in the browser:&lt;/p&gt;

&lt;p&gt;// Your LLM choice is now in your bundle.&lt;br&gt;
  // Your API key is exposed.&lt;br&gt;
  // Switching providers requires a frontend deploy.&lt;br&gt;
  import OpenAI from 'openai'&lt;br&gt;
  const client = new OpenAI({ apiKey: process.env.NEXT_PUBLIC_KEY, dangerouslyAllowBrowser: true })&lt;/p&gt;

&lt;p&gt;Decoupled approach — hook speaks HTTP:&lt;/p&gt;

&lt;p&gt;// The frontend doesn't know or care what's behind this endpoint.&lt;br&gt;
  // It could be GPT-4 today, Claude tomorrow, a local Llama next week.&lt;br&gt;
  const chat = useAIChat({ endpoint: '/api/chat' })&lt;/p&gt;

&lt;p&gt;The server-side API route handles provider selection. It might route to Anthropic by default, fall&lt;br&gt;
  back to Groq during an outage, and serve EU traffic to a region-compliant endpoint — all without&lt;br&gt;
  touching the React component.&lt;/p&gt;

&lt;p&gt;This also means you can run three providers simultaneously in complete isolation:&lt;/p&gt;

&lt;p&gt;const claude = useAIChat({ endpoint: '/api/chat?provider=anthropic' })&lt;br&gt;
  const gpt    = useAIChat({ endpoint: '/api/chat?provider=openai' })&lt;br&gt;
  const groq   = useAIChat({ endpoint: '/api/chat?provider=groq' })&lt;/p&gt;

&lt;p&gt;Each instance has its own message history, loading state, and abort controller. No shared context&lt;br&gt;
  required.&lt;/p&gt;




&lt;p&gt;The React rendering challenge&lt;/p&gt;

&lt;p&gt;The naive implementation of streaming into React state has a real performance problem:&lt;/p&gt;

&lt;p&gt;// This fires a state update — and a re-render — for every token.&lt;br&gt;
  // At 50 tokens/second, that's 50 re-renders/second.&lt;br&gt;
  setResponse(prev =&amp;gt; prev + token)&lt;/p&gt;

&lt;p&gt;React 18 batches some updates, but async loop callbacks aren't always batched. During fast streaming&lt;br&gt;
  you can get tens of renders per second from a single useAIChat call.&lt;/p&gt;

&lt;p&gt;The library solves this by using Zustand's createStore (the vanilla, framework-agnostic version)&lt;br&gt;
  combined with useSyncExternalStore:&lt;/p&gt;

&lt;p&gt;// The store lives outside React.&lt;br&gt;
  // It mutates at whatever rate tokens arrive.&lt;br&gt;
  // useSyncExternalStore decides when React re-renders.&lt;br&gt;
  const storeRef = useRef(createStore())&lt;br&gt;
  const state = useSyncExternalStore(&lt;br&gt;
    storeRef.current.subscribe,&lt;br&gt;
    storeRef.current.getState&lt;br&gt;
  )&lt;/p&gt;

&lt;p&gt;The mutation rate and the render rate are decoupled. The store can receive 100 tokens/second while&lt;br&gt;
  React batches updates efficiently.&lt;/p&gt;

&lt;p&gt;This also enables true isolation. Each useAIChat() call creates its own store instance via a ref.&lt;br&gt;
  Three hook calls → three completely independent stores → three isolated chat instances. No &lt;br&gt;
   wrapping needed, no cross-component re-renders.&lt;/p&gt;




&lt;p&gt;How abort propagates end-to-end&lt;/p&gt;

&lt;p&gt;The stop button works through a chain of signals most people don't trace all the way:&lt;/p&gt;

&lt;p&gt;user clicks Stop&lt;br&gt;
    → abortController.abort()&lt;br&gt;
      → fetch rejects (AbortError)&lt;br&gt;
        → stream loop catches isAbortError() — true&lt;br&gt;
          → loading → false, no error surfaced&lt;br&gt;
            → partial response preserved in messages&lt;/p&gt;

&lt;p&gt;On the server side, req.signal reflects this abort too. Forwarding it to the upstream LLM call&lt;br&gt;
  cancels token generation before it completes:&lt;/p&gt;

&lt;p&gt;const upstream = await fetch(LLM_API_URL, {&lt;br&gt;
    signal: req.signal,   // ← the user stopping the stream cancels the LLM call&lt;br&gt;
    body: JSON.stringify({ messages, stream: true }),&lt;br&gt;
  })&lt;/p&gt;

&lt;p&gt;That's waste reduction at the infrastructure level, not just UI polish.&lt;/p&gt;




&lt;p&gt;What's in the library&lt;/p&gt;

&lt;p&gt;Three packages, all MIT, ~20 kB total:&lt;/p&gt;

&lt;p&gt;Package: @react-ai-stream/core&lt;br&gt;
  What it does: SSE parser, chunk normalizer, Zustand store factory, abort utils — no React dep&lt;br&gt;
  ────────────────────────────────────────&lt;br&gt;
  Package: @react-ai-stream/react&lt;br&gt;
  What it does: useAIChat hook, AIChatProvider context&lt;br&gt;
  ────────────────────────────────────────&lt;br&gt;
  Package: @react-ai-stream/ui&lt;br&gt;
  What it does: , ,  with syntax highlighting&lt;/p&gt;

&lt;p&gt;Built with: TypeScript strict mode, tsup (ESM + CJS), Vitest (34 tests), Turborepo monorepo.&lt;/p&gt;




&lt;p&gt;Try it&lt;/p&gt;

&lt;p&gt;npm install @react-ai-stream/react&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live demo (&lt;a href="https://react-ai-stream-example.vercel.app" rel="noopener noreferrer"&gt;https://react-ai-stream-example.vercel.app&lt;/a&gt;) — three models streaming in parallel via
Groq&lt;/li&gt;
&lt;li&gt;Docs (&lt;a href="https://react-ai-stream-docs.vercel.app" rel="noopener noreferrer"&gt;https://react-ai-stream-docs.vercel.app&lt;/a&gt;) — quickstart, provider setup, API reference&lt;/li&gt;
&lt;li&gt;GitHub (&lt;a href="https://github.com/trimooo/react-ai-stream" rel="noopener noreferrer"&gt;https://github.com/trimooo/react-ai-stream&lt;/a&gt;) — source, examples, architecture deep-dive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture page (&lt;a href="https://react-ai-stream-docs.vercel.app/architecture" rel="noopener noreferrer"&gt;https://react-ai-stream-docs.vercel.app/architecture&lt;/a&gt;) and How streaming works&lt;br&gt;
  (&lt;a href="https://react-ai-stream-docs.vercel.app/concepts/streaming-explained" rel="noopener noreferrer"&gt;https://react-ai-stream-docs.vercel.app/concepts/streaming-explained&lt;/a&gt;) have the full technical detail&lt;br&gt;
   if you want to go deeper.&lt;/p&gt;




&lt;p&gt;What I'd like to hear&lt;/p&gt;

&lt;p&gt;If you've built AI chat in React, I'm curious: what was the hardest part? Provider coupling,&lt;br&gt;
  streaming reliability, render performance, something else? The answer will probably shape what this&lt;br&gt;
  library focuses on next.&lt;/p&gt;

&lt;p&gt;---`&lt;/p&gt;

</description>
      <category>react</category>
      <category>webdev</category>
      <category>typescript</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
