DEV Community

brian austin
brian austin

Posted on

How to stream Claude API responses to the browser in real-time (Node.js + SSE)

If you've ever used Claude or ChatGPT and watched text appear word-by-word, that's server-sent events (SSE) in action. Here's how to build exactly that — a streaming Claude API endpoint that pushes tokens to the browser in real-time.

No WebSockets. No polling. Just a clean 40-line Node.js implementation.

Why streaming matters

Without streaming, your users stare at a blank screen for 3-8 seconds waiting for the full response. With streaming, they see text appear immediately — perceived performance goes from "broken" to "fast".

The difference in user retention is measurable.

The server side (Node.js/Express)

const express = require('express');
const Anthropic = require('@anthropic-ai/sdk');

const app = express();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

app.use(express.json());

app.post('/chat/stream', async (req, res) => {
  const { message } = req.body;

  // Set SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('Access-Control-Allow-Origin', '*');

  try {
    const stream = await client.messages.stream({
      model: 'claude-3-5-haiku-20241022',
      max_tokens: 1024,
      messages: [{ role: 'user', content: message }]
    });

    for await (const event of stream) {
      if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
        // Send each token as an SSE event
        res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
      }
    }

    // Signal completion
    res.write('data: [DONE]\n\n');
    res.end();
  } catch (error) {
    res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
    res.end();
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));
Enter fullscreen mode Exit fullscreen mode

The browser side

async function streamMessage(message) {
  const response = await fetch('/chat/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop(); // Keep incomplete line in buffer

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;

        try {
          const parsed = JSON.parse(data);
          if (parsed.text) {
            // Append to your UI element
            document.getElementById('response').textContent += parsed.text;
          }
        } catch (e) {
          // Skip malformed events
        }
      }
    }
  }
}

// Usage
document.getElementById('send').addEventListener('click', () => {
  document.getElementById('response').textContent = '';
  streamMessage(document.getElementById('input').value);
});
Enter fullscreen mode Exit fullscreen mode

The HTML

<!DOCTYPE html>
<html>
<body>
  <textarea id="input" placeholder="Ask anything..."></textarea>
  <button id="send">Send</button>
  <div id="response"></div>
  <script src="client.js"></script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Run it

npm install express @anthropic-ai/sdk
export ANTHROPIC_API_KEY=your_key_here
node server.js
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:3000 and watch tokens stream in real-time.

The gotchas

Buffer flushing: Some Node.js setups buffer SSE responses. If streaming feels chunky, add res.flushHeaders() right after setting headers.

NGINX reverse proxy: If you're behind NGINX, add this to your location block:

proxy_buffering off;
proxy_cache off;
X-Accel-Buffering: no;
Enter fullscreen mode Exit fullscreen mode

Error recovery: The SSE connection drops on network errors. Add a reconnect loop on the client:

function reconnectOnError(message) {
  let attempts = 0;
  const tryStream = async () => {
    try {
      await streamMessage(message);
    } catch (e) {
      if (attempts++ < 3) setTimeout(tryStream, 1000 * attempts);
    }
  };
  tryStream();
}
Enter fullscreen mode Exit fullscreen mode

Token cost: Streaming doesn't change your token count. You pay the same either way — it's purely a UX improvement.

Skipping the API key overhead

If you're building a prototype or just want to try this pattern without managing your own Anthropic API key, SimplyLouie exposes a flat-rate Claude endpoint at $2/month — same streaming API, no per-token billing surprises.

The endpoint is compatible with this exact code pattern. Swap client = new Anthropic({ apiKey }) for client = new Anthropic({ apiKey: YOUR_LOUIE_KEY, baseURL: 'https://simplylouie.com/api' }).

What to build with this

  • Chat interfaces (obviously)
  • Code generation with live preview
  • Document summarization with progress indication
  • Any UI where the response is long enough that waiting feels broken

The streaming pattern is one of those things that seems optional until your users experience it — then it becomes non-negotiable.

What are you building with Claude streaming? Drop it in the comments.

Top comments (0)