If you've ever used Claude or ChatGPT and watched text appear word-by-word, that's server-sent events (SSE) in action. Here's how to build exactly that — a streaming Claude API endpoint that pushes tokens to the browser in real-time.
No WebSockets. No polling. Just a clean 40-line Node.js implementation.
Why streaming matters
Without streaming, your users stare at a blank screen for 3-8 seconds waiting for the full response. With streaming, they see text appear immediately — perceived performance goes from "broken" to "fast".
The difference in user retention is measurable.
The server side (Node.js/Express)
const express = require('express');
const Anthropic = require('@anthropic-ai/sdk');
const app = express();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
app.use(express.json());
app.post('/chat/stream', async (req, res) => {
const { message } = req.body;
// Set SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
res.setHeader('Access-Control-Allow-Origin', '*');
try {
const stream = await client.messages.stream({
model: 'claude-3-5-haiku-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: message }]
});
for await (const event of stream) {
if (event.type === 'content_block_delta' && event.delta.type === 'text_delta') {
// Send each token as an SSE event
res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
}
}
// Signal completion
res.write('data: [DONE]\n\n');
res.end();
} catch (error) {
res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
res.end();
}
});
app.listen(3000, () => console.log('Server running on port 3000'));
The browser side
async function streamMessage(message) {
const response = await fetch('/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop(); // Keep incomplete line in buffer
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') return;
try {
const parsed = JSON.parse(data);
if (parsed.text) {
// Append to your UI element
document.getElementById('response').textContent += parsed.text;
}
} catch (e) {
// Skip malformed events
}
}
}
}
}
// Usage
document.getElementById('send').addEventListener('click', () => {
document.getElementById('response').textContent = '';
streamMessage(document.getElementById('input').value);
});
The HTML
<!DOCTYPE html>
<html>
<body>
<textarea id="input" placeholder="Ask anything..."></textarea>
<button id="send">Send</button>
<div id="response"></div>
<script src="client.js"></script>
</body>
</html>
Run it
npm install express @anthropic-ai/sdk
export ANTHROPIC_API_KEY=your_key_here
node server.js
Open http://localhost:3000 and watch tokens stream in real-time.
The gotchas
Buffer flushing: Some Node.js setups buffer SSE responses. If streaming feels chunky, add res.flushHeaders() right after setting headers.
NGINX reverse proxy: If you're behind NGINX, add this to your location block:
proxy_buffering off;
proxy_cache off;
X-Accel-Buffering: no;
Error recovery: The SSE connection drops on network errors. Add a reconnect loop on the client:
function reconnectOnError(message) {
let attempts = 0;
const tryStream = async () => {
try {
await streamMessage(message);
} catch (e) {
if (attempts++ < 3) setTimeout(tryStream, 1000 * attempts);
}
};
tryStream();
}
Token cost: Streaming doesn't change your token count. You pay the same either way — it's purely a UX improvement.
Skipping the API key overhead
If you're building a prototype or just want to try this pattern without managing your own Anthropic API key, SimplyLouie exposes a flat-rate Claude endpoint at $2/month — same streaming API, no per-token billing surprises.
The endpoint is compatible with this exact code pattern. Swap client = new Anthropic({ apiKey }) for client = new Anthropic({ apiKey: YOUR_LOUIE_KEY, baseURL: 'https://simplylouie.com/api' }).
What to build with this
- Chat interfaces (obviously)
- Code generation with live preview
- Document summarization with progress indication
- Any UI where the response is long enough that waiting feels broken
The streaming pattern is one of those things that seems optional until your users experience it — then it becomes non-negotiable.
What are you building with Claude streaming? Drop it in the comments.
Top comments (0)