DEV Community

Cover image for Why SSE for AI agents keeps breaking at 2am
Abhishek Chatterjee
Abhishek Chatterjee

Posted on

Why SSE for AI agents keeps breaking at 2am

Why SSE for AI agents keeps breaking at 2am
Every team building AI agent UIs writes their own SSE client. And every team hits the same four bugs.

I know because we shipped 36 agent tools at Praxiom before we sat down and wrote a real protocol instead of patching the same streaming code for the fifteenth time. This is a post-mortem on the four bugs. At the end I'll show you what we extracted.

The setup
You're building a chat-style UI backed by an LLM agent. The agent calls tools, thinks for a few seconds, maybe runs multiple turns. You want the frontend to stream tokens in real-time, show "running web search..." while a tool is active, and display a progress bar for longer operations.

SSE seems like the obvious choice. It's simple. You've used it before. You write the server in an afternoon.

Then you go to production.

Bug #1: The chunk boundary
Here's the hand-rolled SSE parser most people write:

for await (const chunk of stream) {
const text = decoder.decode(chunk);
const lines = text.split('\n');

for (const line of lines) {
if (line.startsWith('event: ')) {
currentEvent = line.slice(7);
} else if (line.startsWith('data: ')) {
dispatch(currentEvent, JSON.parse(line.slice(6)));
currentEvent = ''; // reset
}
}
}

This works in local dev. The event: and data: lines arrive in the same chunk because there's no network latency.

In production, under load, with a real network, a proxy, or nginx in the path — they don't.

Chunk 1 arrives: "event: token\n"

Chunk 2 arrives: "data: {\"text\":\"Hello\"}\n\n"

Your parser resets currentEvent after chunk 1. When chunk 2 arrives, currentEvent is "". The event is dropped silently. Your tokens disappear in production but never in staging.

The fix: currentEvent must survive across reader.read() calls. It's not a per-chunk variable — it's a per-stream variable. Reset it only after the data: line is dispatched, not at any chunk boundary.

// Outside the chunk loop — survives across reads
let currentEventType = '';

for await (const chunk of stream) {
// ... parse lines ...
// Reset ONLY after data: is dispatched
if (line.startsWith('data: ') && currentEventType) {
dispatch(currentEventType, JSON.parse(line.slice(6)));
currentEventType = ''; // reset HERE, not at chunk boundary
}
}

Bug #2: 30 React renders per second
Claude 3.5 Sonnet emits roughly 25–35 tokens per second. Without any batching, each token event directly updates state:

onToken: (e) => setText(prev => prev + e.text)

That's 30 setState calls per second. React batches some of these in concurrent mode, but not reliably under high frequency. What you get is visible jank — the text renders choppy, other UI elements freeze, and on slower devices the whole component tree starts missing frames.

The fix isn't complicated. Accumulate tokens into a buffer and flush on an interval:

let buffer = '';
let lastFlush = Date.now();
const INTERVAL_MS = 50;

onToken: (e) => {
buffer += e.text;
const now = Date.now();
if (now - lastFlush >= INTERVAL_MS) {
setText(prev => prev + buffer);
buffer = '';
lastFlush = now;
}
}

// On stream end, flush remainder
onDone: () => {
if (buffer) setText(prev => prev + buffer);
}

50ms gives you 20 renders per second — smooth to the eye, fraction of the CPU cost. The only subtlety: make sure you flush the remainder on stream end, or the last few tokens never appear.

Bug #3: The loading state that never resolves
Your server looks like this:

async def stream_agent(request):
async def generate():
async for event in agent.run():
yield emitter.token(event.text)
yield emitter.done() # <-- this line

return StreamingResponse(generate())
Enter fullscreen mode Exit fullscreen mode

That done event is what tells the frontend to set isStreaming = false. But what happens when the server crashes mid-stream? An unhandled exception in your agent loop. A memory error. An upstream API timeout that your error handling missed.

The done event is never emitted. The SSE connection closes. Your frontend detects the closure... and does nothing, because "connection closed" and "stream finished" look the same from the client side.

The spinner keeps spinning. The user stares at it. Eventually they reload.

The fix: synthesize a done event client-side when the connection closes without one:

// After the read loop exits normally or via error
if (!receivedDone) {
callbacks.onDone?.({ synthetic: true });
setState(prev => ({ ...prev, isStreaming: false, isDone: true }));
}

The UI recovers cleanly. You log the synthetic done event server-side as a signal that something went wrong upstream.

Bug #4: Retry logic that makes things worse
The standard reconnect implementation retries on any connection failure. But there are two very different kinds of failures:

HTTP errors (4xx/5xx): The request reached your server. The server said no — bad auth token, rate limit, your request body was malformed, the endpoint changed. Retrying the exact same request will get the exact same error. You're just hammering your own server.

Network drops: TCP connection closed mid-stream. The client never got a response, or got a partial one. This should retry — it's likely transient (user's wifi dropped, proxy timeout, load balancer cycle).

Most hand-rolled retry logic doesn't distinguish between them:

// ❌ Wrong — retries on 403, hammers server, wastes tokens
catch (error) {
setTimeout(retry, 1000);
}

The correct split:

const response = await fetch(endpoint, options);

if (!response.ok) {
// HTTP error — throw immediately, no retry
throw new HttpError(response.status, await response.text());
}

// Past this point: we have a 200 and are reading the stream
// Any failure here is a network drop → retry with backoff
try {
await readStream(response.body);
} catch (networkError) {
if (attempt < MAX_RETRIES) {
await sleep(Math.pow(2, attempt) * 1000); // 1s, 2s, 4s
return retry(attempt + 1);
}
}

HTTP errors surface immediately to the user. Network drops retry silently up to 3 times. Your error handling for a 403 Forbidden is fundamentally different from your handling for a dropped connection.

The same five events, every time
After shipping 36 agent tools at Praxiom, we noticed something. Every tool needed to emit:

Tokens accumulating into the response text
Tool calls and their status (running → done / error)
Thinking blocks (for extended thinking models)
Progress for multi-step pipelines
A clean end signal with metadata
And every frontend needed to consume them with the same state shape: text, isStreaming, activeTools, progress, error, isDone.

We were rediscovering the same edge cases on every new tool. The token batching tweak happened three separate times before someone documented it. The chunk boundary bug was fixed in four different files.

So we extracted it.

agent-stream
A typed SSE event protocol for AI agents. Nine event types. Python emitter. React hook. JSON Schema spec.

pip install agent-event-stream
npm install @agent-stream/react

Python — emit from any async generator:

from agent_stream import AgentStreamEmitter
from agent_stream.fastapi import agent_stream_response

emitter = AgentStreamEmitter()

async def run_agent(message: str):
async for chunk in anthropic_client.stream(message):
yield emitter.token(chunk.text)

yield emitter.tool_use("web_search", tool_id, "searching...")
# ... run tool ...
yield emitter.tool_result("web_search", tool_id, "found 3 results", duration_ms=850)

yield emitter.done(num_turns=2, tool_count=1, duration_ms=3200)
Enter fullscreen mode Exit fullscreen mode

@app.post("/chat")
async def chat(req: ChatRequest):
return agent_stream_response(run_agent(req.message))

React — full state from one hook:

const { text, isStreaming, activeTools, progress, error, isDone, startStream } =
useAgentStream();

return (


{text}{isStreaming && }


{activeTools.map(tool => )}
{progress && }
startStream('/chat', { message })}>
Send


);

All four bugs above are handled in the library. Cross-chunk parsing is correct by construction. Token batching is on by default (50ms). Synthetic done fires when the server drops the connection. Retry logic distinguishes HTTP errors from network drops.

The JSON Schema spec (spec/events.schema.json) means you can implement the protocol in any language. It's not a React-only thing — we have a FastAPI server and the client is a plain TypeScript class that works in any framework.

What's next
We're building more of these extracts out of Praxiom's infrastructure — the parts that turn out to be the same across every AI product. agent-stream is the first.

If you're hitting these bugs, or if you've hit others we haven't documented — open an issue. The hard-won production details are the most valuable thing we can contribute.

→ github.com/abhichat85/agent-stream

Extracted from Praxiom - www.praxiomai.xyz

Top comments (0)