Originally published on AI Tech Connect.
What changed, and why it matters now For two years, the honest answer to "can we ship a voice agent?" was "you can ship a demo." The pieces existed — streaming transcription, a language model, text-to-speech — but stitching them together produced an agent that was slow, talked over people, lost the thread when a caller switched languages mid-sentence, and felt unmistakably robotic. In May 2026, OpenAI released a set of three purpose-built real-time audio models through its Realtime API that close most of that gap at once. This is, for builders, less a single model launch and more a permission slip: voice agents are now a thing you put in front of paying customers, not just a thing you show investors. GPT-Realtime-2 — a speech-to-speech voice agent model that, per OpenAI, is built on…
Top comments (0)