記事一覧に戻る

Nvidia releases Nemotron 3 Nano Omni, a 30-billion-parameter open multimodal model trained on data from competing AI labs including Qwen, OpenAI, and DeepSeek.

THE DECODER · 2026年4月29日

AI要約

  • Nemotron 3 Nano Omni is an open-source multimodal model that processes text, images, video, and audio. It uses a Mamba-Transformer hybrid with Mixture-of-Experts, activating about three billion parameters per query, and supports a context window of up to 256,000 tokens.
  • On the OSWorld benchmark for GUI agents (a type of AI that performs computer tasks autonomously), accuracy jumps from 11.1 to 47.4 points compared to the previous version. Nvidia says throughput at the same interactivity level is up to nine times higher than Qwen3-Omni.
  • Synthetic training data comes from competing models: Qwen, OpenAI's gpt-oss-120b, Kimi-K2.5, and DeepSeek-OCR generated captions and reasoning traces. Nvidia processed roughly 717 billion tokens across seven training stages. The model ships under the NVIDIA Open Model Agreement, which allows commercial use, and Nvidia is releasing training data and training pipelines alongside the model weights.

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める