AIToday

NVIDIA releases Cosmos 3, a unified omni-model combining world generation, physical reasoning, and action generation in a single architecture

Hugging Face Blog2d ago2 min read
NVIDIA releases Cosmos 3, a unified omni-model combining world generation, physical reasoning, and action generation in a single architecture

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    NVIDIA released Cosmos 3 on Hugging Face with two model sizes: Cosmos 3 Nano (8B parameter model) optimized for efficient inference on workstation-grade compute like the RTX PRO 6000 GPU, and Cosmos 3 Super (32B parameter model) designed for large-scale synthetic data generation and research on NVIDIA Hopper and Blackwell GPUs.

  2. 2

    Cosmos 3 is built on a Mixture-of-Transformers (MoT) architecture that processes text, image, video, audio, and action in a single unified model. It replaces the previous approach where developers had to work with separate models for world generation, controlled generation, scene understanding, and policy generation.

  3. 3

    The model supports multiple input-output combinations: text/image/video-to-video generation, text/video-to-text output (for vision language tasks), action/image/text-to-video (forward dynamics), text/video-to-action (inverse dynamics), and image/text-to-video-and-action (policy model). The release includes Diffusers integration, post-training scripts on GitHub, and open synthetic data generation datasets for physical AI.

Discussion

No discussion yet for this article

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →