DEV Community

Lav Kumar Dixit
Lav Kumar Dixit

Posted on

Stop Making Your AI Chatbot Slower: Streaming Responses with Spring AI and Server-Sent Events


**The Wrong Approach

Most applications follow this flow:**

User Query

LLM Request

Wait 5-10 Seconds

Return Full Response

**The Better Architecture

Use Spring AI's streaming support combined with Server-Sent Events (SSE).**

User Query

Spring AI

Streaming Tokens

SSE Endpoint

Browser Updates UI Instantly


Spring AI Streaming Example
``
@RestController
@RequiredArgsConstructor
public class ChatController {

private final ChatClient chatClient;

@GetMapping(value = "/chat/stream",
        produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> streamResponse(
        @RequestParam String message) {

    return chatClient.prompt()
            .user(message)
            .stream()
            .content();
}
Enter fullscreen mode Exit fullscreen mode

}
`
`
Frontend Integration

const eventSource = new EventSource(
    "/chat/stream?message=Explain Spring AI"
);

eventSource.onmessage = (event) => {
    document.getElementById("output").innerHTML += event.data;
};
Enter fullscreen mode Exit fullscreen mode

Performance Benefits
Faster Perceived Response Time

Even if the model takes 8 seconds to complete:

Without Streaming → First token after 8s

With Streaming → First token after 200-500ms

The total generation time remains the same, but users perceive the application as significantly faster.

Reduced Bounce Rate

Users are less likely to leave while waiting because they can see progress immediately.

Better AI UX

Streaming makes even local Ollama models feel responsive.

Top comments (0)