YK Sugi

Posted on May 7 • Originally published at daft.ai

What is physical AI - is it more than just hype?

#ai #robotics #machinelearning

Physical AI has become a real trend over the past year. At CES (Consumer Electronics Show), NVIDIA's Jensen Huang declared 2026 the "ChatGPT moment for physical AI." Georgetown's Center for Security and Emerging Technology published a policy brief ranking it alongside ImageNet (2012) and ChatGPT (2022) as a genuine inflection point. Deloitte and BCG both published major reports on it in 2026.

But some call it just a buzzword - a marketing rebrand of robotics work that's been happening for years.

So is there something real here, or is it just hype?

What physical AI actually means

Physical AI refers to AI systems that perceive real environments through sensors, reason about them, and take physical action. The key difference from traditional robotics is generalization - these systems adapt to novel situations rather than repeating pre-programmed routines.

The key components:

Vision-language-action (VLA) models - the core architectural shift. Traditional robotics uses separate pipelines for perception, planning, and control. VLAs fuse all three into a single end-to-end model that takes camera input and language instructions and directly outputs motor commands. Google DeepMind's RT-2 established the paradigm in 2023, and it's since been adopted by Gemini Robotics, NVIDIA's GR00T N1, Figure AI's Helix, and Physical Intelligence's pi0
World models - neural networks trained on millions of hours of real-world video that understand physics, spatial relationships, and cause-and-effect
Sim-to-real transfer - training robots in physics-accurate simulations (digital twins), then deploying those skills in the real world
Multimodal perception - processing cameras, lidar, force sensors, and language simultaneously
Edge and cloud inference - models can run on-device for latency-sensitive control or in the cloud for higher-level reasoning and planning. For example, Google's Gemini Robotics offers both: a cloud API for embodied reasoning and an on-device model for local execution

It's worth noting this extends beyond robots. NVIDIA's definition includes autonomous cameras and smart spaces, and Honeywell frames AI-assisted control rooms and smart buildings as physical AI too.

The case that it's real

The money is serious

The physical AI market hit roughly $5 billion in 2025 and is projected to reach $68-84 billion by 2034-35. Barclays projects the humanoid robot market alone could reach $40 billion by 2035 - or $200 billion in an optimistic scenario.

Real deployments, not just demos

The strongest evidence that physical AI is real comes from production numbers, not press releases:

Waymo has completed over 10 million paid robotaxi rides
Amazon deployed its millionth robot, reporting a 10% fleet efficiency improvement
Figure AI at BMW loaded over 90,000 parts into 30,000 X3 vehicles during 10-hour shifts at their Spartanburg plant, using their Helix VLA
Honeywell has AI-assisted control rooms running at TotalEnergies' Port Arthur Refinery

Cost curves are plummeting

Robot unit costs dropped 30x over the past decade - from roughly $3 million to around $100,000. Bank of America projects humanoid robot costs will fall further, from $35,000 in 2025 to between $13,000 and $17,000 per unit in the next decade. At those prices, the math starts working for a lot of industries that couldn't justify automation before.

The case that it's more hype than substance

The term itself is a rebrand

Many of the underlying technologies - reinforcement learning, computer vision, sim-to-real transfer, sensor fusion - have existed for years. Some argue that "physical AI" is a new label on existing work.

That said, VLA models are genuinely new. The idea of fusing perception, planning, and control into a single end-to-end model only emerged in 2023 with RT-2, and the field has accelerated rapidly since - VLA submissions at ICLR (International Conference on Learning Representations, one of the top ML conferences) went from 1 in 2024 to 164 in 2026. This isn't just relabeled reinforcement learning - it's a real architectural shift.

The demo-to-production gap is massive

This is the most important counterargument. Figure AI's BMW deployment is a good example - the robot started at only 25% of human speed and improved significantly over 11 months, but still required a hardware redesign. Automotive industry insiders say deeply integrated AI won't ship in vehicles until 2030-2032. The gap between an impressive CES demo and a system that runs reliably for 10-hour shifts, 365 days a year, is enormous.

Physics doesn't forgive hallucinations

When ChatGPT hallucinates, you get a wrong answer. When a surgical robot hallucinates, someone gets hurt. When an autonomous vehicle hallucinates, someone can die.

Real-time physical operation demands near-zero latency with near-zero tolerance for errors. The bar is fundamentally higher than for software AI.

What the experts say

Deloitte sees it as real enough that 80% of surveyed business leaders plan adoption within two years, but cautions about simulation-to-reality gaps: "Visual images in simulated environments are pretty good, but the real world has nuances that look different."

Georgetown CSET treats it as strategically significant enough for a dedicated policy brief, framing it as a competitive race between the US and China that warrants immediate policymaker attention.

Honeywell calls it "a quiet revolution" - not emerging technology, but present-day impact in industrial settings, solving real problems like workforce shortages and operational reliability.

The verdict

Physical AI is more than just hype. Like LLMs before it, there will be a mix of real impact and overhype.

The investment, the deployments, and the cost curves all point to a real and accelerating trend. Companies like Waymo and Amazon aren't running pilot programs for PR - they're building production infrastructure.

If you're a developer or engineer, the underlying skills - VLA architectures, reinforcement learning, computer vision, simulation, edge computing, sensor fusion - are the real signal beneath the marketing noise. Those will matter regardless of what we end up calling this trend.

Learn more

If you want to learn more about physical AI, feel free to check out our new newsletter. If you're a machine learning engineer getting started with physical AI, we're building a tool for multimodal model training called MultiBase.

DEV Community