AIToday

OpenAI launches GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper voice models in Realtime API with expanded context and reasoning controls

Latent SpaceMay 8, 20262 min read
OpenAI launches GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper voice models in Realtime API with expanded context and reasoning controls

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    OpenAI released three streaming audio models: GPT-Realtime-2 (a native speech-to-speech model for voice agents), GPT-Realtime-Translate (supporting live speech translation from 70+ input languages into 13 output languages), and GPT-Realtime-Whisper (providing low-latency streaming transcription). All are available in the Realtime API now.

  2. 2

    GPT-Realtime-2 expands context from 32K to 128K tokens, supports adjustable reasoning effort levels (minimal, low, medium, high, xhigh; default low), and includes features like audible tool transparency (e.g., 'checking your calendar') and recovery behavior (e.g., 'I'm having trouble with that right now'). Time-to-first-audio ranges from 1.12s at minimal reasoning to 2.33s at high reasoning.

  3. 3

    On Scale AI's Audio MultiChallenge S2S leaderboard, GPT-Realtime-2 placed #1 with instruction retention improving from 36.7% to 70.8% APR versus GPT-Realtime-1.5. Artificial Analysis reported 96.6% on Big Bench Audio speech-to-speech reasoning and pricing of $1.15/hour audio input and $4.61/hour audio output (unchanged versus prior model).

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →