DEV Community

Cover image for How to Stream AI Responses in Real-Time Using FastAPI and SSE
jaydeep sureliya
jaydeep sureliya

Posted on

How to Stream AI Responses in Real-Time Using FastAPI and SSE

If your AI application waits for the full response before rendering, you are hurting your UX.

Streaming responses in real-time is one of the simplest ways to improve perceived performance.

I implemented this for my project:
πŸ‘‰ https://mindstashhq.space

Let’s break it down.


What We Are Building

A streaming AI response system where:

  • Tokens arrive in real time
  • UI updates instantly
  • Tool calls are visible to users

Backend Implementation (FastAPI)

We use Server-Sent Events (SSE).

Why SSE?

  • Simpler than WebSockets
  • Native browser support
  • Perfect for server β†’ client streaming

Example structure:

  • Response type: StreamingResponse
  • Content-Type: text/event-stream

Each event looks like:

event: text_delta
data: "Hello"
Enter fullscreen mode Exit fullscreen mode

Event types:

  • text_delta
  • tool_start
  • tool_result
  • error
  • done

The backend streams tokens directly from the AI provider and forwards them.


Frontend Implementation (React)

Use EventSource:

  • Open connection in useEffect
  • Listen for events
  • Update state incrementally

Example behaviors:

  • Append text on text_delta
  • Show loading UI on tool_start
  • Update data on tool_result
  • Close connection on done

Handling Errors Properly

Important rule:

Never discard partial responses.

If an error occurs mid-stream:

  • Keep existing text
  • Show error indicator
  • Allow retry if needed

This significantly improves UX.


SSE vs WebSockets

For this use case, SSE wins:

  • Less complexity
  • No connection management overhead
  • Easier to debug

Use WebSockets only if you need true bidirectional communication.


Conclusion

Streaming is not optional anymore. It is expected.

If your AI app feels slow, the issue might not be your model.
It is your delivery mechanism.

Top comments (0)