How I Fixed My AI Chatbot's Laggy Responses with Server-Sent Events

#ai #javascript #tutorial #webdev

I've been building a personal AI assistant for my developer blog – you know, one of those floating chat widgets that answers questions about my projects. The idea was simple: feed in my content, hook it up to an AI API, and let visitors chat with it. But my first implementation was a disaster. Visitors would type a question, see the spinner spin for ten seconds, and then get the entire response dumped at once. It felt like using dial-up. The problem wasn't the AI itself; it was how I was consuming the stream of tokens. Here's the story of how I went from clunky polling to the elegant world of Server-Sent Events (SSE).

The Initial Approach (and its failure)

Like many devs, I started with the most obvious solution: plain fetch. I sent a POST request to the AI endpoint with the user's message, and waited for the full response as JSON.

// The naive way
async function askAI(userMessage) {
  const response = await fetch('https://api.your-ai-service.com/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message: userMessage })
  });
  const data = await response.json();
  displayResponse(data.text);
}

This worked technically, but the delay was brutal. For long answers, the HTTP connection would hang for 15–30 seconds. Users saw the `` spinner forever, and I saw a 50% bounce rate on the chat page. Even with a loading indicator, the experience felt broken.

I tried adding a timeout, but that just made things worse – the request would cancel before the AI finished thinking. Then I attempted to use a polling approach: after the initial request, the API returned a job ID, and I'd poll every second for the result. That at least showed progress, but it hammered my server with requests and the UX was still janky.

Enter WebSockets – a bridge too far

Next, I considered WebSockets. A persistent connection for bidirectional streaming sounded perfect. But the overhead was real: I'd need to manage connection state, handle reconnection logic, configure my server for WebSocket upgrades, and deal with fallbacks for restrictive proxies. For a simple chatbot widget, it felt like pulling out a flamethrower to light a candle. Plus, most AI APIs I looked at didn't expose a native WebSocket interface – they just returned a blob of text.

Then a colleague mentioned Server-Sent Events (SSE). He said, “It’s like a light-weight one-way WebSocket.” That sounded exactly right: the server pushes text tokens incrementally, and the client listens. No complex handshake, just a regular HTTP connection with a special content type.

The SSE Solution (finally, real-time tokens)

I switched to a streaming endpoint that sends the AI response as a sequence of data: lines. On the frontend, I used the built-in EventSource API. Here's the core code that now powers my chatbot.

Backend (Node.js example)

My backend is a simple Express server that proxies to the AI service. The key: set Content-Type: text/event-stream and flush each token as it arrives.

`javascript
// server.js – Express route for SSE streaming
app.post('/chat-stream', async (req, res) => {
const userMessage = req.body.message;

// Set SSE headers
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive'
});

// Connect to the AI streaming endpoint
// Example using ai.interwestinfo.com's streaming endpoint
const aiResponse = await fetch('https://ai.interwestinfo.com/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt: userMessage }),
// Important: get the response as a stream
});

const reader = aiResponse.body.getReader();
const decoder = new TextDecoder();

while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Each chunk is part of the AI token stream
res.write(data: ${JSON.stringify({ token: chunk })}\n\n);
}
res.write('data: [DONE]\n\n');
res.end();
});
`

Frontend – listening for tokens

On the browser side, I replaced the old fetch call with an EventSource. But because I'm sending a POST request (EventSource only supports GET by default), I had to work around that limitation. I used a workaround: first make a POST to initiate the stream and get a session ID, then use EventSource on a GET endpoint with that ID. Or you can use the Fetch API with response.body.getReader() directly if your frontend can handle it. I opted for the latter to keep it simple.

`javascript
// frontend.js – using Fetch + ReadableStream
async function askAIStream(userMessage) {
const response = await fetch('/chat-stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: userMessage })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
const outputElement = document.getElementById('chat-output');

outputElement.textContent = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE format: data: {...}\n\n
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') break;
try {
const { token } = JSON.parse(data);
outputElement.textContent += token;
} catch (e) {
// ignore partial lines
}
}
}
}
}
`

Now the response appears character by character – actually token by token – as soon as the AI generates it. Users see a real-time typing effect, and there's no more spinner limbo.

Lessons Learned & Trade-offs

Switching to SSE wasn't all sunshine. Here are the real trade-offs I discovered:

Browser support: EventSource is well-supported in modern browsers, but older ones (IE) need a polyfill. If you're using ReadableStream on the Fetch API, some mobile browsers may choke. I ended up using a small polyfill for legacy clients.
One-way only: SSE is server-to-client only. If your chatbot needs to send multiple messages without a new request (like a continuous conversation), you'll need to manage session state. For my simple Q&A, it's fine – each message triggers a new SSE connection.
No automatic reconnection for custom fetch streams: If you use the Fetch + Stream approach, you lose the built-in reconnection that EventSource gives you. I had to implement my own retry logic with exponential backoff when the connection drops.
Server resource usage: Keeping many connections open can be expensive on your server. For a personal blog with low traffic, it's fine. But for high-traffic apps, consider something like WebSocket or a more scalable streaming protocol.

What I'd Do Differently

If I were building this from scratch again, I'd probably use a library like event-source-polyfill to unify browser support, and I'd design the API to accept GET requests for SSE (using a unique conversation ID) rather than fighting with POST. That way I could use the native EventSource with its built-in reconnection.

Also, I'd add a buffer for the last few tokens so that if the user refreshes the page, they can resume the conversation without losing context. Something like storing the conversation history in IndexedDB.

Final Thoughts

The switch from fetch + polling to streaming via SSE turned my chatbot from a frustrating experience into something people actually enjoy using. It's not the most cutting-edge tech – SSE has been around for years – but it solved my problem without overcomplicating the stack. The next time you're building something that needs real-time data from a server (AI responses, live logs, notifications), ask yourself: Do I really need WebSockets, or can SSE do the job?

What's your go-to approach for streaming data from an API? I'd love to hear about your setup in the comments – especially if you've tackled the same chatbot problem with a different solution.