AIToday

Researchers develop three-billion-parameter model that listens continuously to audio and decides every 0.4 seconds whether to speak, handling translation, transcription, and sound recognition simultaneously

THE DECODER1d ago3 min read
Researchers develop three-billion-parameter model that listens continuously to audio and decides every 0.4 seconds whether to speak, handling translation, transcription, and sound recognition simultaneously

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    Audio-Interaction, created by researchers from China, Hong Kong, and Singapore, processes continuous audio streams and outputs either <silent> or <response> tokens after each 0.4-second chunk, enabling the model to stay quiet or begin speaking based on context. The system handles translation, transcription, dialog, and reactions to everyday noises in a single model.

  2. 2

    The team built StreamAudio-2M, a training dataset with 2.6 million units and about 302,000 hours of audio across seven skill areas and 28 subtasks, by having a language model design realistic scenarios, sourcing matching clips from a database or generating missing sounds with audio models like AudioX or ElevenLabs, then smoothing recordings for naturalness.

  3. 3

    Audio-Interaction scored 58.15 points on the audio benchmark MMAU, narrowly beating its base model Qwen2.5-Omni-3B, and comes close to much larger 7B models. On the ProactiveSound Bench with 644 human-curated events, the model outperforms Gemini 3 Flash, Kimi-Audio-Instruct, and Step-Audio 2.

  4. 4

    Code and weights are available on GitHub under the Apache 2.0 license with no restrictions on commercial use; the full training dataset is set to follow later.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →