
Summaries like this, in your inbox every morning.
Sign up free →NVIDIA released Cosmos 3 on Hugging Face with two model sizes: Cosmos 3 Nano (8B parameter model) optimized for efficient inference on workstation-grade compute like the RTX PRO 6000 GPU, and Cosmos 3 Super (32B parameter model) designed for large-scale synthetic data generation and research on NVIDIA Hopper and Blackwell GPUs.
Cosmos 3 is built on a Mixture-of-Transformers (MoT) architecture that processes text, image, video, audio, and action in a single unified model. It replaces the previous approach where developers had to work with separate models for world generation, controlled generation, scene understanding, and policy generation.
The model supports multiple input-output combinations: text/image/video-to-video generation, text/video-to-text output (for vision language tasks), action/image/text-to-video (forward dynamics), text/video-to-action (inverse dynamics), and image/text-to-video-and-action (policy model). The release includes Diffusers integration, post-training scripts on GitHub, and open synthetic data generation datasets for physical AI.
No discussion yet for this article
Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started Free5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack