NVIDIA Nemotron 3 Ultra: 550B Open-Weight MoE You Can Host

#infra #opensource #ai #machinelearning

Originally published on AI Tech Connect.

What builders need to know It is a real open release. NVIDIA published weights, training data and recipes for a 550B-parameter model under the Linux Foundation's permissive OpenMDW-1.1 licence — not a research-only or gated one. The architecture is the story. A hybrid Mamba-2 plus Transformer MoE, 550B total with just 55B active per token, giving high throughput at a 1M-token context window. Speed, not top-of-the-table intelligence. On the Artificial Analysis Intelligence Index it scores roughly 48 — the best US open-weight model, but Kimi K2.6 still leads open weights at about 54. You can host it, but it is a cluster. The full weights are around 1.1TB, so realistically a multi-GPU node with vLLM; the NVFP4 quantised checkpoint is the sane starting point. Dual-market relevance. In-region…

Read the full article on AI Tech Connect →

DEV Community

NVIDIA Nemotron 3 Ultra: 550B Open-Weight MoE You Can Host

Top comments (0)