AIToday

OpenAI releases GPT-Realtime-2 voice model with reasoning capabilities, plus live translation and transcription models

THE DECODERMay 7, 20262 min read
OpenAI releases GPT-Realtime-2 voice model with reasoning capabilities, plus live translation and transcription models

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    OpenAI shipped three new voice models: GPT-Realtime-2 (for reasoning and real-time conversation), GPT-Realtime-Translate (covering 70+ input languages and 13 output languages), and GPT-Realtime-Whisper (for low-latency streaming transcription). All are available now through the Realtime API.

  2. 2

    GPT-Realtime-2 expands the context window from 32,000 to 128,000 tokens to support longer conversations, allows developers to dial reasoning intensity across five levels (minimal, low, medium, high, xhigh), and uses verbal stalling techniques like 'one moment' to signal the system is working. On benchmarks, it reaches 96.6 percent accuracy on Big Bench Audio at the 'high' setting, up from 81.4 percent on its predecessor.

  3. 3

    Pricing is token-based for GPT-Realtime-2 ($32 per million audio input tokens and $64 per million audio output tokens) and minute-based for the other two models ($0.034 per minute for translation, $0.017 per minute for transcription). The Realtime API supports EU data residency.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →