OpenAI Voice Intelligence APIs: Real-Time Audio for Developers

#product #modelrelease #ai #machinelearning

Originally published on AI Tech Connect.

The short version: three models, three jobs On approximately 8 May 2026, OpenAI expanded its Realtime API with three dedicated audio intelligence models. These are not incremental patches to the existing gpt-4o-realtime-preview offering — they supersede it with purpose-built models for distinct voice workloads. The trio covers conversational reasoning, live cross-lingual translation, and streaming transcription, and all three share the same WebSocket-based session architecture. Model Primary job Input Output When to use gpt-realtime-2 Voice conversation with reasoning Audio + text Audio + text Conversational agents, voice UX, tool calling over voice gpt-realtime-translate Live speech-to-speech translation Audio (70+ languages) Audio (13 languages) Multilingual call centres, cross-language…

Read the full article on AI Tech Connect →

DEV Community

OpenAI Voice Intelligence APIs: Real-Time Audio for Developers

Top comments (0)