OpenAI releases GPT-Realtime-2 voice model with reasoning capabilities, plus live translation and transcription models

THE DECODERMay 7, 2026Send on LINE

Summaries like this, in your inbox every morning.

3 Key Points

OpenAI shipped three new voice models: GPT-Realtime-2 (for reasoning and real-time conversation), GPT-Realtime-Translate (covering 70+ input languages and 13 output languages), and GPT-Realtime-Whisper (for low-latency streaming transcription). All are available now through the Realtime API.
GPT-Realtime-2 expands the context window from 32,000 to 128,000 tokens to support longer conversations, allows developers to dial reasoning intensity across five levels (minimal, low, medium, high, xhigh), and uses verbal stalling techniques like 'one moment' to signal the system is working. On benchmarks, it reaches 96.6 percent accuracy on Big Bench Audio at the 'high' setting, up from 81.4 percent on its predecessor.
Pricing is token-based for GPT-Realtime-2 ($32 per million audio input tokens and $64 per million audio output tokens) and minute-based for the other two models ($0.034 per minute for translation, $0.017 per minute for transcription). The Realtime API supports EU data residency.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime