brian austin

Posted on Apr 25

How to stream Claude API responses to the browser in real-time (Node.js + SSE)

#ai #node #tutorial #javascript

If you've ever used Claude or ChatGPT and watched text appear word-by-word, that's server-sent events (SSE) in action. Here's how to build exactly that — a streaming Claude API endpoint that pushes tokens to the browser in real-time.

No WebSockets. No polling. Just a clean 40-line Node.js implementation.

Why streaming matters

Without streaming, your users stare at a blank screen for 3-8 seconds waiting for the full response. With streaming, they see text appear immediately — perceived performance goes from "broken" to "fast".

The difference in user retention is measurable.

The server side (Node.js/Express)

const express = require('express');
const Anthropic = require('@anthropic-ai/sdk');

const app = express();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

app.use(express.json());

app.post('/chat/stream', async (req, res) => {
  const { message } = req.body;

  // Set SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('Access-Control-Allow-Origin', '*');

  try {
    const stream = await client.messages.stream({
      model: 'claude-3-5-haiku-20241022',
      max_tokens: 1024,
      messages: [{ role: 'user', content: message }]
    });

    for await (const event of stream) {
      if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
        // Send each token as an SSE event
        res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
      }
    }

    // Signal completion
    res.write('data: [DONE]\n\n');
    res.end();
  } catch (error) {
    res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
    res.end();
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

The browser side

async function streamMessage(message) {
  const response = await fetch('/chat/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop(); // Keep incomplete line in buffer

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;

        try {
          const parsed = JSON.parse(data);
          if (parsed.text) {
            // Append to your UI element
            document.getElementById('response').textContent += parsed.text;
          }
        } catch (e) {
          // Skip malformed events
        }
      }
    }
  }
}

// Usage
document.getElementById('send').addEventListener('click', () => {
  document.getElementById('response').textContent = '';
  streamMessage(document.getElementById('input').value);
});

The HTML

<!DOCTYPE html>
<html>
<body>
  <textarea id="input" placeholder="Ask anything..."></textarea>
  <button id="send">Send</button>
  <div id="response"></div>
  <script src="client.js"></script>
</body>
</html>

Run it

npm install express @anthropic-ai/sdk
export ANTHROPIC_API_KEY=your_key_here
node server.js

Open http://localhost:3000 and watch tokens stream in real-time.

The gotchas

Buffer flushing: Some Node.js setups buffer SSE responses. If streaming feels chunky, add res.flushHeaders() right after setting headers.

NGINX reverse proxy: If you're behind NGINX, add this to your location block:

proxy_buffering off;
proxy_cache off;
X-Accel-Buffering: no;

Error recovery: The SSE connection drops on network errors. Add a reconnect loop on the client:

function reconnectOnError(message) {
  let attempts = 0;
  const tryStream = async () => {
    try {
      await streamMessage(message);
    } catch (e) {
      if (attempts++ < 3) setTimeout(tryStream, 1000 * attempts);
    }
  };
  tryStream();
}

Token cost: Streaming doesn't change your token count. You pay the same either way — it's purely a UX improvement.

Skipping the API key overhead

If you're building a prototype or just want to try this pattern without managing your own Anthropic API key, SimplyLouie exposes a flat-rate Claude endpoint at $2/month — same streaming API, no per-token billing surprises.

The endpoint is compatible with this exact code pattern. Swap client = new Anthropic({ apiKey }) for client = new Anthropic({ apiKey: YOUR_LOUIE_KEY, baseURL: 'https://simplylouie.com/api' }).

What to build with this

Chat interfaces (obviously)
Code generation with live preview
Document summarization with progress indication
Any UI where the response is long enough that waiting feels broken

The streaming pattern is one of those things that seems optional until your users experience it — then it becomes non-negotiable.

What are you building with Claude streaming? Drop it in the comments.

DEV Community