DEV Community

brian austin
brian austin

Posted on

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

How to build a Claude chatbot with streaming responses in under 50 lines of Node.js

Streaming is one of those features that sounds complicated but completely transforms the user experience. Instead of staring at a spinner for 3-5 seconds, users see the response appear word by word — like watching someone type.

Here's how to do it with Claude in Node.js. The whole thing is under 50 lines.

The full code

const Anthropic = require('@anthropic-ai/sdk');
const readline = require('readline');

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const history = [];

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

async function chat(userMessage) {
  history.push({ role: 'user', content: userMessage });

  process.stdout.write('\nClaude: ');
  let fullResponse = '';

  const stream = await client.messages.stream({
    model: 'claude-opus-4-5',
    max_tokens: 1024,
    messages: history
  });

  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
      process.stdout.write(chunk.delta.text);
      fullResponse += chunk.delta.text;
    }
  }

  console.log('\n');
  history.push({ role: 'assistant', content: fullResponse });
}

function prompt() {
  rl.question('You: ', async (input) => {
    if (input.toLowerCase() === 'quit') return rl.close();
    await chat(input);
    prompt();
  });
}

prompt();
Enter fullscreen mode Exit fullscreen mode

That's it. Run it with:

npm install @anthropic-ai/sdk
ANTHROPIC_API_KEY=your_key node chatbot.js
Enter fullscreen mode Exit fullscreen mode

How streaming actually works

Claude's API uses Server-Sent Events (SSE). When you call messages.stream(), the connection stays open and the server pushes chunks as they're generated.

Each chunk has a type. The ones you care about:

Type What it means
content_block_start Claude is about to start a text block
content_block_delta Here's the next piece of text
content_block_stop That block is done
message_stop Whole response is done

You only need content_block_delta with text_delta — that's where the actual words are.

Adding it to an Express server

The terminal version is fine for testing. Here's how to expose it as an HTTP endpoint:

const express = require('express');
const Anthropic = require('@anthropic-ai/sdk');

const app = express();
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

app.use(express.json());

app.post('/chat', async (req, res) => {
  const { messages } = req.body;

  // Set SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const stream = await client.messages.stream({
    model: 'claude-opus-4-5',
    max_tokens: 1024,
    messages
  });

  for await (const chunk of stream) {
    if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
      res.write(`data: ${JSON.stringify({ text: chunk.delta.text })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});

app.listen(3000);
Enter fullscreen mode Exit fullscreen mode

Consuming the stream on the frontend

async function sendMessage(messages) {
  const response = await fetch('/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ messages })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split('\n');
    buffer = lines.pop(); // Keep incomplete line in buffer

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') return;
        const { text } = JSON.parse(data);
        // Append text to your UI element
        document.getElementById('response').textContent += text;
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Common mistakes

1. Not handling the buffer correctly

SSE chunks don't always align with JSON boundaries. The buffer pattern above handles cases where a chunk gets split mid-JSON.

2. Forgetting to end the response

If you don't call res.end() after the stream finishes, the client will hang waiting for more data.

3. Not setting the right headers

Without Content-Type: text/event-stream, the browser won't treat it as SSE — it'll wait for the full response before rendering anything.

4. Streaming to multiple users without isolation

Each request needs its own stream instance. Don't share stream state between requests.

The cost question

Streaming doesn't change the token count — you pay for the same tokens whether you stream or not. What it changes is perceived latency: a 2-second response that streams feels 3x faster than a 2-second response that appears all at once.

If you're building on a per-token API, that latency improvement is free. If you're building on a flat-rate API (like SimplyLouie at $2/month), there's no cost math to worry about at all — stream everything, always.

What to build next

Once streaming works, the natural next steps are:

  • Typing indicators: Show "Claude is thinking..." before the first chunk arrives
  • Stop button: Let users interrupt a long response mid-stream
  • Token counting: Show a live token counter as the response streams
  • Conversation export: Save the full streamed response to history after message_stop

The full code above works as-is. Copy it, set your API key, run it. You'll have a streaming Claude chatbot in under 2 minutes.

Building on Claude's API? SimplyLouie offers flat-rate API access at $2/month — no token counting, no surprise bills.

Top comments (1)

Collapse
 
lee_my_950a0d992798b9b3bd profile image
Lee My

Quick personal review of AhaChat after trying it
I recently tried AhaChat to set up a chatbot for a small Facebook page I manage, so I thought I’d share my experience.
I don’t have any coding background, so ease of use was important for me. The drag-and-drop interface was pretty straightforward, and creating simple automated reply flows wasn’t too complicated. I mainly used it to handle repetitive questions like pricing, shipping fees, and business hours, which saved me a decent amount of time.
I also tested a basic flow to collect customer info (name + phone number). It worked fine, and everything is set up with simple “if–then” logic rather than actual coding.
It’s not an advanced AI that understands everything automatically — it’s more of a rule-based chatbot where you design the conversation flow yourself. But for basic automation and reducing manual replies, it does the job.
Overall thoughts:
Good for small businesses or beginners
Easy to set up
No technical skills required
I’m not affiliated with them — just sharing in case someone is looking into chatbot tools for simple automation.
Curious if anyone else here has tried it or similar platforms — what was your experience?