Originally published on AI Tech Connect.
What changed, in plain terms Voice has been the awkward middle child of the generative-AI stack. Text is cheap and well understood; images and video grab the headlines; but spoken, real-time, two-way audio — the thing that powers a phone agent, a live caption feed, or an interpreter sitting between two people who do not share a language — has stayed fiddly and expensive. OpenAI's 7 May announcement is aimed squarely at that gap. The headline is that the Realtime API is now generally available, out of beta and intended for production. Alongside that, OpenAI shipped three new models, each tuned for a distinct job rather than one general-purpose "voice" endpoint: GPT-Realtime-2 — the company's first voice model with what OpenAI describes as GPT-5-class reasoning. The context window jumps…
Top comments (0)