OpenAI launches GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper voice models in Realtime API with expanded context and reasoning controls

Latent SpaceMay 8, 2026Send on LINE

Summaries like this, in your inbox every morning.

3 Key Points

OpenAI released three streaming audio models: GPT-Realtime-2 (a native speech-to-speech model for voice agents), GPT-Realtime-Translate (supporting live speech translation from 70+ input languages into 13 output languages), and GPT-Realtime-Whisper (providing low-latency streaming transcription). All are available in the Realtime API now.
GPT-Realtime-2 expands context from 32K to 128K tokens, supports adjustable reasoning effort levels (minimal, low, medium, high, xhigh; default low), and includes features like audible tool transparency (e.g., 'checking your calendar') and recovery behavior (e.g., 'I'm having trouble with that right now'). Time-to-first-audio ranges from 1.12s at minimal reasoning to 2.33s at high reasoning.
On Scale AI's Audio MultiChallenge S2S leaderboard, GPT-Realtime-2 placed #1 with instruction retention improving from 36.7% to 70.8% APR versus GPT-Realtime-1.5. Artificial Analysis reported 96.6% on Big Bench Audio speech-to-speech reasoning and pricing of $1.15/hour audio input and $4.61/hour audio output (unchanged versus prior model).

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime