Reactive is Dead: Build Low-Latency Voice Agents with OpenAI Realtime and JDK WebSockets
Building voice agents with OpenAI's Realtime API shouldn't require dragging the massive, mind-bending complexity of Spring WebFlux into your codebase. Thanks to JDK virtual threads, we can finally dump reactive streams and build ultra-low-latency, bi-directional audio pipelines using simple, blocking java.net.http.WebSocket code that looks synchronous but scales infinitely.
Why Most Developers Get This Wrong
- Over-engineering with WebFlux: Devs default to Project Reactor (
Flux/Mono) for audio streaming, resulting in unmaintainable stack traces and brutal debugging sessions when tracking frame drops. - Ignoring Thread-per-Connection Reality: They forget that the OpenAI Realtime API requires persistent, stateful, bi-directional WebSockets where audio chunks must be processed in strict chronological order.
- Ignoring Backpressure on Audio: Reactive libraries often mask underlying buffer bloat, causing massive latency spikes in voice turn-taking.
The Right Way
The modern way to handle OpenAI's gpt-4o-realtime-preview is pairing native JDK WebSockets with Structured Concurrency and Virtual Threads.
- Pin Thread to WebSocket: Use
java.net.http.HttpClientto open a WebSocket connection and let virtual threads block on incoming audio frames. - Structured Concurrency (
StructuredTaskScope): Run the audio input (mic to OpenAI) and audio output (OpenAI to speaker) in a clean, scope-bound parent-child relationship. - Plain Old Blocking Queues: Use standard
LinkedBlockingQueuefor thread-safe audio chunk handoffs—no complex reactive operators needed.
Show Me The Code
Here is how you orchestrate the bi-directional audio loops cleanly using Java's structured concurrency:
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
WebSocket ws = HttpClient.newHttpClient().newWebSocketBuilder()
.header("Authorization", "Bearer " + API_KEY)
.buildAsync(URI.create("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview"), listener)
.join();
// Virtual threads run the bi-directional audio loops concurrently
scope.fork(() -> { streamMicToWebSocket(ws); return null; });
scope.fork(() -> { streamWebSocketToSpeaker(listener.audioQueue()); return null; });
scope.join().throwIfFailed();
}
Key Takeaways
- Reactive is a legacy paradigm for I/O-bound streaming; virtual threads have made blocking code the gold standard for performance and readability in 2026.
- OpenAI's Realtime API demands strict sequential audio frame delivery, which is trivial to guarantee with standard Java blocking queues and loops.
- Keep your dependencies clean by relying on
java.net.http.WebSocketinstead of pulling in heavy, third-party Netty wrappers.
Shameless plug: javalld.com has full LLD implementations with step-by-step execution traces — free to use while prepping.
Top comments (0)