Nvidia Unveils Physical AI Agent Skills, 32B VLA Model at CVPR

#ai #programming #tech #product

Nvidia launched physical AI agent skills and a 32B VLA model at CVPR to automate AV and robotics workflows, addressing the fragmented tooling bottleneck.

At CVPR, Nvidia launched physical AI agent skills and Alpamayo 2 Super, a 32B-parameter VLA model. The moves target the fragmented workflow bottleneck in autonomous vehicle and robotics research.

Key facts

Alpamayo 2 Super: 32B-parameter open VLA model for AV.
Cosmos 3: first full omnimodel for physical AI.
InstantNuRec enables fast 3D Gaussian scene reconstruction.
AlpaGym scales RL policy rollouts across thousands of GPUs.
OmniDreams generates photorealistic camera frames in real time.

Nvidia's CVPR announcement tackles a structural problem in physical AI: the gap between model capability and production workflow. The company rolled out a suite of AI agent skills designed to automate scene reconstruction, synthetic data generation, and policy evaluation — steps that currently require stitching together disparate tools.

The Workflow Problem

The core challenge in physical AI research isn't simply developing stronger models. It's building a full workflow around them — reconstructing real-world scenes, generating edge-case scenarios, training policies, evaluating behavior and rapidly iterating. Today, these steps are fragmented across separate tools, slowing the pace of experimentation as researchers struggle to piece them together According to Nvidia's blog post.

Alpamayo 2 Super and Cosmos 3

Nvidia Alpamayo 2 Super is an open 32-billion-parameter reasoning vision language action (VLA) model that reasons, plans and acts. It represents Nvidia's most powerful open driving foundation model to date. Earlier this week, Nvidia also announced Cosmos 3, the open frontier model for physical AI and the world's first full omnimodel unifying vision reasoning, world and action generation. Cosmos 3 leads across open model public leaderboards central to physical AI [According to Nvidia].

Agent Skills for AV and Robotics

For AV researchers, the problem is the "long tail" of driving — rare interactions, unusual road geometry, lighting changes and edge-case behaviors. Neural Reconstruction skills help AI agents turn fleet-captured data into editable 3D scenes for simulation and synthetic data generation, while technologies including Nvidia Omniverse NuRec, InstantNuRec, Harmonizer and HiGS accelerated renderer help accelerate reconstruction. InstantNuRec enables fast 3D Gaussian road-scene reconstruction from images without per-scene optimization.

Nvidia AlpaGym, an open source closed-loop reinforcement learning framework, extends that approach by connecting policy rollouts and high-fidelity simulation with agent skills, scaling across thousands of GPUs. Nvidia OmniDreams, an action-conditioned generative world model, adds photorealistic rendering to the simulation loop, generating camera frames that respond directly to policy actions in real time.

Broader Context

The announcement follows Nvidia's release of Nemotron 3 Ultra, a 550B open-weight model, just days earlier. The company is also shipping its first Vera Rubin NVL72 rack to CoreWeave, according to Dell. The physical AI push aligns with industry predictions that 2026 is a breakthrough year for AI agents across domains [According to industry leaders, as previously reported].

What to watch

Watch for adoption metrics on Alpamayo 2 Super and Cosmos 3 on the Open Physical AI Leaderboard, and whether Nvidia's agent skills reduce time-to-simulation for AV startups by the promised order of magnitude. Also track if competitors like Waymo or Tesla adopt the open models.

Source: blogs.nvidia.com

Originally published on gentic.news