Build 2026 & Cosmos 3: Microsoft and NVIDIA Drop Major AI Models This Week

#ai #machinelearning #opensource #microsoft

Build 2026 & Cosmos 3: Microsoft and NVIDIA Drop Major AI Models This Week

The first week of June 2026 was absolutely packed, and while the open-weight mega-drop (25+ models) stole headlines, two announcements stand out as genuinely platform-shifting: NVIDIA's Cosmos 3 and Microsoft's new MAI model family at Build 2026.

🌌 NVIDIA Cosmos 3 — The First Open Omnimodal World Model

NVIDIA dropped Cosmos 3 on June 1, and it's hard to overstate its ambition. Built on a mixture-of-transformers architecture, Cosmos 3 isn't just another LLM or image generator — it's an open world foundation model for physical AI that moves fluidly across text, images, video, audio, and actions.

Key highlights:

Open-source on Hugging Face under a permissive license
Currently ranked #1 open-source Text-to-Image and #1 Image-to-Video model by Artificial Analysis
Top policy model on RoboArena for robotics tasks
Built for physical AI — connecting understanding, generation, simulation, and action

This isn't just a model release — it's a blueprint for how future AI systems will perceive and interact with the physical world.

🏗️ Microsoft Build 2026: MAI Models Go Multimodal

At Microsoft Build 2026 (June 2–4, San Francisco), Microsoft unveiled a major expansion of its MAI (Microsoft AI) model family across four modalities:

MAI-Image-2.5 & MAI-Image-2.5-Flash

#2 on the Arena leaderboard for image editing
Precise, controllable image editing (not just generation)
Available in Microsoft Foundry for production workflows
Flash variant for low-latency use cases

MAI-Voice-2

15+ languages with expanded emotional expression
Significant leap in natural speech synthesis
Built for Copilot and real-time voice interactions

MAI-Transcribe-1.5

43 languages supported
Mixture-of-Experts (MoE) architecture for efficiency
Enterprise-grade speech-to-text accuracy

All models are available now via Azure AI Foundry, Fireworks AI, Baseten, and OpenRouter.

🔮 Why This Matters

Both releases point in the same direction: multimodality is the new normal.

NVIDIA is betting that unifying every modality (including robotics actions) into one open model will unlock physical AI. Microsoft is betting that developers need modality-specific, production-ready models they can deploy today.

Whether you're building the next robotics startup or adding voice/image capabilities to your app — this was the week the toolbox got a whole lot bigger.

What are you building with these? Drop a comment below!

First published June 11, 2026

DEV Community

Build 2026 & Cosmos 3: Microsoft and NVIDIA Drop Major AI Models This Week

Build 2026 & Cosmos 3: Microsoft and NVIDIA Drop Major AI Models This Week

🌌 NVIDIA Cosmos 3 — The First Open Omnimodal World Model

🏗️ Microsoft Build 2026: MAI Models Go Multimodal

MAI-Image-2.5 & MAI-Image-2.5-Flash

MAI-Voice-2

MAI-Transcribe-1.5

🔮 Why This Matters

Top comments (0)