If your AI application waits for the full response before rendering, you are hurting your UX.
Streaming responses in real-time is one of the simplest ways to improve perceived performance.
I implemented this for my project:
π https://mindstashhq.space
Letβs break it down.
What We Are Building
A streaming AI response system where:
- Tokens arrive in real time
- UI updates instantly
- Tool calls are visible to users
Backend Implementation (FastAPI)
We use Server-Sent Events (SSE).
Why SSE?
- Simpler than WebSockets
- Native browser support
- Perfect for server β client streaming
Example structure:
- Response type:
StreamingResponse - Content-Type:
text/event-stream
Each event looks like:
event: text_delta
data: "Hello"
Event types:
- text_delta
- tool_start
- tool_result
- error
- done
The backend streams tokens directly from the AI provider and forwards them.
Frontend Implementation (React)
Use EventSource:
- Open connection in
useEffect - Listen for events
- Update state incrementally
Example behaviors:
- Append text on
text_delta - Show loading UI on
tool_start - Update data on
tool_result - Close connection on
done
Handling Errors Properly
Important rule:
Never discard partial responses.
If an error occurs mid-stream:
- Keep existing text
- Show error indicator
- Allow retry if needed
This significantly improves UX.
SSE vs WebSockets
For this use case, SSE wins:
- Less complexity
- No connection management overhead
- Easier to debug
Use WebSockets only if you need true bidirectional communication.
Conclusion
Streaming is not optional anymore. It is expected.
If your AI app feels slow, the issue might not be your model.
It is your delivery mechanism.
Top comments (0)