
Summaries like this, in your inbox every morning.
Sign up free →NVIDIA Nemotron 3 Ultra, an open large language model with 550 billion total parameters and 55 billion active parameters, is now available for day-zero deployment on Amazon SageMaker JumpStart. The model uses a hybrid Transformer-Mamba Mixture-of-Experts architecture and supports up to 1M tokens context length.
The model delivers 5x faster inference and up to 30% lower cost for agentic workloads (AI systems that autonomously plan, call tools, and iterate across many steps). Its MoE architecture activates only 55B of its 550B parameters per forward pass, enabling sustained multi-step reasoning across hundreds of turns while maintaining coherence.
Users can deploy Nemotron 3 Ultra through SageMaker Studio or the SageMaker Python SDK without managing infrastructure, selecting from supported GPU instance types (ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge). The model is optimized for the NVFP4 format.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion





Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started FreeFree · takes 30 seconds · unsubscribe anytime
5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack