Reactive is Dead: Build Low-Latency Voice Agents with OpenAI Realtime and JDK WebSockets

#java #concurrency #ai #llm

Reactive is Dead: Build Low-Latency Voice Agents with OpenAI Realtime and JDK WebSockets

Building voice agents with OpenAI's Realtime API shouldn't require dragging the massive, mind-bending complexity of Spring WebFlux into your codebase. Thanks to JDK virtual threads, we can finally dump reactive streams and build ultra-low-latency, bi-directional audio pipelines using simple, blocking java.net.http.WebSocket code that looks synchronous but scales infinitely.

Why Most Developers Get This Wrong

Over-engineering with WebFlux: Devs default to Project Reactor (Flux/Mono) for audio streaming, resulting in unmaintainable stack traces and brutal debugging sessions when tracking frame drops.
Ignoring Thread-per-Connection Reality: They forget that the OpenAI Realtime API requires persistent, stateful, bi-directional WebSockets where audio chunks must be processed in strict chronological order.
Ignoring Backpressure on Audio: Reactive libraries often mask underlying buffer bloat, causing massive latency spikes in voice turn-taking.

The Right Way

The modern way to handle OpenAI's gpt-4o-realtime-preview is pairing native JDK WebSockets with Structured Concurrency and Virtual Threads.

Pin Thread to WebSocket: Use java.net.http.HttpClient to open a WebSocket connection and let virtual threads block on incoming audio frames.
Structured Concurrency (StructuredTaskScope): Run the audio input (mic to OpenAI) and audio output (OpenAI to speaker) in a clean, scope-bound parent-child relationship.
Plain Old Blocking Queues: Use standard LinkedBlockingQueue for thread-safe audio chunk handoffs—no complex reactive operators needed.

Show Me The Code

Here is how you orchestrate the bi-directional audio loops cleanly using Java's structured concurrency:

try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
    WebSocket ws = HttpClient.newHttpClient().newWebSocketBuilder()
        .header("Authorization", "Bearer " + API_KEY)
        .buildAsync(URI.create("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview"), listener)
        .join();

    // Virtual threads run the bi-directional audio loops concurrently
    scope.fork(() -> { streamMicToWebSocket(ws); return null; });
    scope.fork(() -> { streamWebSocketToSpeaker(listener.audioQueue()); return null; });

    scope.join().throwIfFailed();
}

Key Takeaways

Reactive is a legacy paradigm for I/O-bound streaming; virtual threads have made blocking code the gold standard for performance and readability in 2026.
OpenAI's Realtime API demands strict sequential audio frame delivery, which is trivial to guarantee with standard Java blocking queues and loops.
Keep your dependencies clean by relying on java.net.http.WebSocket instead of pulling in heavy, third-party Netty wrappers.