DEV Community

zhongqiyue
zhongqiyue

Posted on

I Built a Streaming AI Chat Client Without Losing My Mind

It started with a simple idea: a chat interface where the AI responds in real-time, token by token. You know, that satisfying typewriter effect you see in every AI product these days. I thought, "How hard can it be?"

Spoiler: it was hard. Not because of AI itself, but because of the streaming pipeline between the API and the browser. I spent an entire weekend debugging state, connection drops, and flickering UI before finding an approach that didn't make me want to throw my laptop out the window.

Let me walk you through what I tried, what broke, and the technique that finally worked.

The Naive First Attempt

I started with the obvious: call the AI API, wait for the full response, then display it. In React, that looked like:

const [messages, setMessages] = useState([]);

const sendMessage = async (text) => {
  const res = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({ message: text })
  });
  const data = await res.json();
  setMessages(prev => [...prev, { role: 'assistant', content: data.reply }]);
};
Enter fullscreen mode Exit fullscreen mode

It worked, but users hated it. The UI froze for 3-10 seconds while the AI thought. People kept clicking "Send" multiple times because they thought the app was broken. UX disaster.

What I Tried (and Failed at)

Polling

I considered having the server return a job ID, then poll for partial results every second. But that meant server-side state management, cleanup of stale jobs, and – worst of all – users saw updates in unpredictable chunks. Not a smooth streaming experience.

WebSockets

Next up: WebSockets. I set up a backend with Socket.IO and thought I was golden. But then reality hit:

  • Connection management: reconnection logic, heartbeat, handling browser tabs closing
  • State synchronization: what if the AI finishes between two socket events? Race conditions.
  • Complexity: my simple chat app suddenly needed a whole infrastructure layer.

I spent two days just on reconnection. Two days. And then I realized: for a single-response-per-user chat, WebSockets are overkill. They shine for interactive bidirectional communication (like collaborative editing), not for one-shot streaming.

The Approach That Finally Worked: ReadableStream with Manual SSE Parsing

The breakthrough came when I realized that many AI APIs (including OpenAI, Anthropic, and the service I used – Interwest AI – which has a /api/chat/stream endpoint) already send their responses as Server-Sent Events (SSE) over HTTP. I just needed to consume them properly.

The key technique: use fetch() with response.body.getReader() to read a stream of bytes, then parse the SSE protocol myself. No external libraries, no WebSocket complexity.

Here's the core of it:

async function* streamChat(messages) {
  const response = await fetch('https://ai.interwestinfo.com/api/chat/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ messages })
  });

  if (!response.ok) throw new Error(`HTTP ${response.status}`);

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });

    // Split on newlines (SSE protocol)
    const lines = buffer.split('\n');
    buffer = lines.pop(); // keep incomplete line

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;
        yield JSON.parse(data).choices[0].delta.content || '';
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This is an async generator – it yields tokens as they arrive. Then in my React component, I consume it without blocking the UI:

const sendMessage = async (text) => {
  const userMsg = { role: 'user', content: text };
  setMessages(prev => [...prev, userMsg]);

  // Create an assistant message placeholder
  const assistantIndex = messages.length + 1;
  setMessages(prev => [...prev, { role: 'assistant', content: '' }]);

  for await (const token of streamChat([...messages, userMsg])) {
    setMessages(prev => {
      const updated = [...prev];
      updated[assistantIndex].content += token;
      return updated;
    });
  }
};
Enter fullscreen mode Exit fullscreen mode

It works beautifully. The UI stays responsive, tokens appear in real-time, and I didn't need WebSockets or polling.

Lessons Learned & Trade-offs

Pros

  • Simple: No WebSocket server, no reconnection logic. HTTP all the way.
  • Cancelable: You can abort the fetch using AbortController if the user stops typing.
  • Backend-agnostic: Works with any API that supports SSE (ChatGPT, Claude, any custom AI).

Cons

  • Browser support: ReadableStream is well-supported now (Chrome, Firefox, Safari 16.4+). But if you need to support older Safari, you'll need a polyfill or fallback.
  • No built-in reconnection: If the connection drops mid-stream, you lose the partial response. WebSockets handle that better.
  • Only one-way: You can't push updates to the client without a request. For chat, that's fine.

When to use WebSockets instead

  • If you need the server to initiate messages (e.g., live notifications during streaming)
  • If you have a high-frequency data feed (like stock prices)
  • If you absolutely need reconnection without losing state

But for most AI chat apps? HTTP streaming with SSE is simpler and more reliable.

What I'd Do Differently Next Time

  1. Test with throttled networks early – slow 3G revealed bugs in my buffer parsing.
  2. Use a custom hook for stream state management (loading, error, abort).
  3. Ignore fancy libraries – native fetch + ReadableStream was all I needed.

Final Thoughts

If you're building an AI interface right now, resist the urge to add WebSockets unless you absolutely need them. Streaming over HTTP is often enough, and it saves you from a whole class of headaches.

I'm curious: what's your streaming strategy? Are you using WebSockets, SSE, or something else? Hit me up in the comments – I'd love to hear how others tackle this.

Top comments (0)