
Summaries like this, in your inbox every morning.
Sign up free →OpenAI released three streaming audio models: GPT-Realtime-2 (a native speech-to-speech model for voice agents), GPT-Realtime-Translate (supporting live speech translation from 70+ input languages into 13 output languages), and GPT-Realtime-Whisper (providing low-latency streaming transcription). All are available in the Realtime API now.
GPT-Realtime-2 expands context from 32K to 128K tokens, supports adjustable reasoning effort levels (minimal, low, medium, high, xhigh; default low), and includes features like audible tool transparency (e.g., 'checking your calendar') and recovery behavior (e.g., 'I'm having trouble with that right now'). Time-to-first-audio ranges from 1.12s at minimal reasoning to 2.33s at high reasoning.
On Scale AI's Audio MultiChallenge S2S leaderboard, GPT-Realtime-2 placed #1 with instruction retention improving from 36.7% to 70.8% APR versus GPT-Realtime-1.5. Artificial Analysis reported 96.6% on Big Bench Audio speech-to-speech reasoning and pricing of $1.15/hour audio input and $4.61/hour audio output (unchanged versus prior model).
No comments yet. Be the first to share your thoughts!
Log in to join the discussion




Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack