Shawn

Posted on Jun 19

FutureX · Physical AI Daily — Issue 33 (06/20)

#ai #robotics #machinelearning #research

Today's Highlights

· Hyundai Motor plans to acquire SoftBank's remaining ~9.65% stake in Boston Dynamics for approximately $325 million, achieving full ownership; SoftBank is exercising a put option to exit, clearing the path for Atlas mass production and a U.S. IPO (board review expected this week, per media reports).

· Galaxy General-Purpose Robotics (Chinese humanoid robotics company) released its general-purpose cerebellum foundation model AstraBrain-WBC 0.5, trained on approximately 2 billion frames of human motion data, marking the first reported GPT-style scaling law in whole-body real-time robot control (success rate 83.3%→92.6%), and claiming to surpass Nvidia's SONIC (vendor-reported figures).

· Autonomous driving company Momenta (Chinese AV startup) received approval from China's securities regulator for an overseas listing, pivoting to the Hong Kong Stock Exchange after a setback in U.S. markets; it plans to issue no more than approximately 43.75 million shares to raise roughly $1 billion.

· Paper: Richard Sutton, John Carmack, and others introduce Physical Atari, bringing real-time reinforcement learning out of simulation and onto a physical robot that operates real Atari game controllers.

· World model "cold thinking" and fundraising run in parallel: new benchmark WRBench identifies a lack of "persistent state cores" in current world models; meanwhile General Intuition (backed by Bezos and Schmidt) is in talks to raise approximately $300 million at a valuation exceeding $2 billion.

I. Research Papers

Physical Atari: Bringing Real-Time Reinforcement Learning Back to the Physical World · benchmark

This is a hardcore attempt to "return reinforcement learning to the real world" — the team forgoes simulation and instead builds a real robot to manipulate real Atari controllers, reading the screen with a camera, forcing algorithms to contend with the latency, wear, and noise of the physical world. The author lineup includes RL pioneer Sutton and id Software co-founder Carmack.

Khurram Javed et al. (incl. Richard S. Sutton, John Carmack) · arXiv 2606.19357 source

The system consists of Robotroller (a robot operating a CX40+ controller), an Atari Devbox rendering game visuals and reward signals on screen, an off-the-shelf camera, and a desktop PC — recreating the real-time interaction loop of the Arcade Learning Environment. For robustness, all Robotroller movements are transmitted via bearings to reduce wear, with high-frequency servo monitoring and limit interventions when needed. The paper focuses on making this real-time RL platform durable and reproducible, migrating the arcade RL benchmark from pure software to a physical embodiment.

Current World Models Lack a "Persistent State Core" (WRBench) · world-model

At a moment when world models are heralded as a key step toward AGI, this paper targets their collective blind spot — existing benchmarks reward "visually appealing outputs and controllable camera motion" but never ask whether the world continues to evolve coherently once the camera looks away.

Jinpeng Lu et al. · arXiv 2606.20545 source · HF 6↑

The authors argue that a true world model requires an internal state that evolves continuously, decoupled from observation, so that objects persist and events unfold even when unobserved. They introduce WRBench, the first systematic diagnostic benchmark to treat camera motion as "an intervention on observability," testing whether a generated world maintains state consistency after leaving the field of view. Results show that current mainstream world models broadly lack this persistent state core.

ImageWAM: Does a World-Action Model Really Need Video Generation? · world-model

This paper throws cold water on world-action models (WAMs) while pointing to a shortcut: rather than using expensive video generation to predict multi-frame futures, reframe the problem as "image editing."

Yuyang Zhang et al. · arXiv 2606.19531 source · HF 7↑

The authors identify three compounding costs in video-based WAMs: multi-frame future token inference is expensive; capacity is wasted on temporal and appearance details unrelated to action; and long-horizon imagination errors can mislead action prediction. ImageWAM instead repurposes pretrained image editing models for robot action prediction, modeling only the transformation from "current frame → target frame," providing a prior better aligned with action.

World Engine: Autonomous Driving Enters the "Post-Training" Era · autonomy

The most acute shortage in end-to-end autonomous driving is "long-tail hazard scenarios" that are nearly impossible to collect from real data; this paper shifts the training focus from "feeding more logs" to "post-training on synthesized hazards."

Tianyu Li, Li Chen et al. · arXiv 2606.19836 source

World Engine reconstructs high-fidelity interactive environments from real driving logs, then systematically extrapolates realistic safety-critical variants (rare, high-interaction scenarios), using these synthetic hazards to post-train pretrained driving models. The authors argue that long-tail interactions define the actual safety boundaries of learned policies, and since they cannot be collected at scale in the real world, synthetic post-training is the natural remedy.

HumanScale: Egocentric Human Video Can Outperform Real Robot Data · vla

This paper provides a key controlled comparison for the approach of using egocentric human video in place of real robot data — and finds that human video is not only cheaper but can actually outperform teleoperated robot data for embodied pretraining.

Juncheng Ma et al. · arXiv 2606.20521 source · HF 3↑

Teleoperated robot trajectories have long been the primary source for embodied pretraining, owing to precise action supervision and good embodiment alignment, but collection costs are high and behavioral and environmental diversity is low. The authors systematically compare egocentric human video against teleoperated robot data as pretraining sources, finding that human video — more scalable, cheaper, and more diverse — can match or exceed real robot data as a pretraining source under equivalent conditions.

SWAP: Equivariant Symmetric World Model Sets New Quadruped Parkour Record · locomotion

Welding the geometric prior of "symmetry" directly into the world model and policy network eliminates redundant learning of bilaterally symmetric interactions, enabling quadruped parkour to reach new real-world records.

Kaixin Lan et al. · arXiv 2606.19928 source

Purely data-driven latent world models redundantly encode bilaterally symmetric interactions as independent patterns, increasing the learning burden and weakening the capture of geometric regularity. SWAP proposes an end-to-end equivariant symmetric world model, embedding symmetry into both the world model and the actor-critic network. In real-hardware tests, the robot crossed a 2.13-meter gap, climbed a 1.63-meter platform, and demonstrated zero-shot generalization to unseen mirrored terrains.

"Generating" Robot Hands Directly from Human Demonstrations · manipulation

Robotics excels at learning control but rarely "learns the body"; this paper uses over 4 million frames of human hand motion to optimize the morphology of a dexterous hand itself.

Sha Yi, Carmelo Sferrazza, Michael T. Tolley et al. (UC San Diego / UC Berkeley) · arXiv 2606.20549 source

Co-optimizing design and control is a massive combinatorial challenge. Rather than learning a complex controller for each candidate design, the authors evaluate designs using a simple post-fabrication policy (inverse kinematics matching fingertip positions), leveraging 4 million frames of everyday human fingertip motion to optimize a tree-structured dexterous hand to reproduce target actions — yielding multiple designs including a general-purpose 6-DOF hand.

Playful Agentic Robot Learning: "Play" First, Then Task · manipulation

Allowing an embodied coding agent to freely "play" and accumulate skills before formal tasks arrive pushes robot learning from "instruction-driven" toward "autonomous exploration."

Junyi Zhang et al. · arXiv 2606.19419 source · HF 31↑

Existing agentic robots can write executable Code-as-Policy and iteratively correct through trial and error, but remain task-driven and require explicit instructions to acquire skills. The authors propose using autonomous play for continual skill learning prior to downstream tasks: RATs (Robot Agent Teams) propose novel and learnable exploration tasks during play, plan and execute code-based policies, self-assess progress, diagnose failures with dense step-level feedback and retry, distilling successful executions into a persistent code skill library for reuse at test time.

Other papers today: Scaling Self-Play for End-to-End Driving (Gigapixel high-throughput simulator, pure-pixel self-play training for end-to-end driving, arXiv 2606.19641 source); Sensorimotor World Models (Schölkopf et al., JEPA-style latent world model + inverse dynamics regularization against representation collapse, arXiv 2606.20104 source); EquiVLA (first general SO(2)-equivariant VLA framework, arXiv 2606.19784 source); MemoryWAM (efficient world-action model with persistent memory, arXiv 2606.20562 source); Finetuning VLA Requires Fewer Layers (training-free compression of π0/GR00T-N1.5, arXiv 2606.20246 source); One Demo is Worth a Thousand Trajectories (Toyota Research, action-viewpoint augmentation, arXiv 2606.19586 source); ENPIRE agentic policy self-improvement paper released (HF↑7, previously reported).

Open Source · Tools · Benchmarks

· WorkBenchMark: A LEGO Duplo assembly benchmark inspired by the RoboCup Smart Manufacturing League, with 400 tasks across four difficulty tiers, open-vocabulary perception, and a "disassemble-to-infer-assembly" baseline; the authors report this planning-based baseline outperforms modern VLAs across all tiers; benchmark, simulation environment, and baseline will be open-sourced (arXiv 2606.19358 source).

· CRAX: A safe RL benchmark based on MuJoCo XLA (MJX), with vectorization and hardware acceleration offering roughly 100× speedup over CPU baselines; includes six environments and three agent task types (arXiv 2606.20376 source).

· ForEnt: A multimodal quadruped "entrapment" dataset collected with a low-cost Unitree Go2 across eight woodland sites in the UK, covering approximately 1.7 km across 11 sequences, specifically capturing instability and failure modes such as vine entanglement (arXiv 2606.19675 source).

II. Funding & Deals

Hyundai Motor × Boston Dynamics ｜ Acquisition (Full Ownership) ｜ ~$325M ｜ SoftBank Exit · humanoid ⚠️ Media reports

According to reports including Meiri Jingji Xinwen, Hyundai Motor Group plans to acquire SoftBank's remaining ~9.65% stake in Boston Dynamics for approximately $325 million, making it a wholly-owned subsidiary. The Hyundai group (including Chung Euisun and Hyundai Motor, Kia, Mobis, and Glovis) already held over 90% of shares; SoftBank is exercising a put option agreed upon at the time of the 2020 sale. The board is expected to review the deal around June 22. Boston Dynamics' Atlas humanoid robot is planned for mass production in 2026, with initial units going to Hyundai's own factories and Google DeepMind; the deal is widely interpreted as clearing a share structure obstacle ahead of a U.S. IPO.Source: TipRanks et al. source

General Intuition ｜ New Round (in talks) ｜ ~$300M ｜ Valuation exceeds $2B · world-model ⚠️ Per reports

According to TechCrunch and SiliconANGLE, General Intuition — a New York company spun out of gaming clip platform Medal roughly eight months ago — is in talks to raise approximately $300 million in a new round, at a valuation roughly four times its $134 million seed round from six months ago. The company trains embodied AI and world models on approximately 2 billion first-person interactive gaming clips from its platform's tens of millions of monthly active users; the first-person, interactive nature of the data is its core value. The new round reportedly involves Jeff Bezos, Eric Schmidt, and existing investors Khosla Ventures and General Catalyst, with plans to expand compute and launch products by late summer.Source: TechCrunch source

AGILINK ｜ New Round ｜ Hundreds of millions of RMB ｜ Valuation exceeds $1B · hardware

AGILINK (Chinese dexterous hand startup spun out of AgiBot), a dexterous manipulation hand company spun out of AgiBot (Chinese humanoid robotics company), has raised hundreds of millions of RMB in a new round led by a major Chinese internet conglomerate, with BV Baidu Ventures, Yunfeng Capital, and existing investors Lanchi Ventures and Hillhouse Capital participating, the latter two oversubscribing. Founded roughly six months ago, the company has already reached unicorn status. Unlike many peers stuck at the demo stage, AGILINK has multiple dexterous hand models in mass production with quarterly shipments in the thousands — dexterous hands are moving from "demonstration components" to real shipment volumes.Source: 36Kr source

Momenta ｜ Hong Kong IPO (CSRC filing approved) ｜ Plans to raise ~$1B · autonomy

Autonomous driving company Momenta has received approval from China's securities regulator for an overseas listing, planning to issue no more than approximately 43.75 million shares on the Hong Kong Stock Exchange — a key step in its pivot to Hong Kong after a setback in U.S. markets. Reports indicate its urban NOA (Navigate on Autopilot) commands approximately 65% third-party market share, backed by industrial shareholders including Toyota, Mercedes-Benz, SAIC Motor, and General Motors.Source: Sina Finance source

BioGeometry ｜ Strategic Financing ｜ Hundreds of millions of RMB · world-model

BioGeometry (Chinese life-science AI company), which focuses on "microscale world models" for life sciences, has completed a strategic financing round of hundreds of millions of RMB, extending the "world model" paradigm from physical space into molecular and life science domains.Source: 36Kr source

III. Commercial Deployment

Volvo Autonomous Trucks Begin Hauling Freight for AVI-SPL in Texas · autonomy

Volvo Autonomous Solutions' self-driving trucks have begun hauling real freight for AVI-SPL in Texas, marking the transition of long-haul autonomous logistics from testing to live commercial operations.Source: Truck News source

Fourier Intelligence Lands Multimillion-Dollar Rehabilitation Robot Order in Southeast Asia · embodied

Fourier Intelligence (Chinese rehabilitation robotics company) has secured a multimillion-dollar robot order in Southeast Asia, expanding into the global high-end rehabilitation robotics market — this is a confirmed order rather than a letter of intent, in the relatively mature embodied deployment scenario of medical rehabilitation.Source: Sina Finance source

GM Adds ~50 Robots at Factory That Recently Cut Over 1,000 Workers · industrial

According to Carscoops, General Motors has added approximately 50 robots at a factory that recently laid off more than 1,000 workers, putting the tradeoff between automation and jobs in plain view. The scale itself is modest, but the news value lies in the direct juxtaposition of headcount reductions and robot additions.Source: Carscoops source

Mars Auto × LX Pantos Establish ~7,000 km Round-Trip Autonomous Freight Route Across the Americas · autonomy

South Korean autonomous driving company Mars Auto, in partnership with logistics provider LX Pantos, has established a round-trip autonomous freight route spanning approximately 7,000 km across the Americas, targeting cross-border long-haul logistics.Source: 벤처스퀘어 source

Robot Achieves 99.5% Accuracy Plugging Moving Cables in Factory Tests · industrial ⚠️ Test-reported figures

Interesting Engineering reports that a robot achieved 99.5% accuracy when connecting cables in motion during factory tests — a capability demonstration of a high-difficulty manipulation task; independent verification at production scale and under stable manufacturing-line conditions remains to be seen.Source: Interesting Engineering source

IV. Industry News

Galaxy General-Purpose Robotics Releases AstraBrain-WBC 0.5 Foundation Model · world-model ⚠️ Vendor-reported figures

Galaxy General-Purpose Robotics (Chinese humanoid robotics company) released AstraBrain-WBC 0.5, its foundation model for whole-body real-time humanoid robot control, with approximately 80.4 million parameters, trained on what the company claims is the world's largest human motion dataset — 20,000 hours (approximately 2 billion frames). The company's headline result: scaling training data from 2 million to 2 billion frames raised zero-shot whole-body motion tracking success rate from 83.26% to 92.58%, which it claims is the first demonstration of GPT-style scaling laws in robot control. Compared to Nvidia's earlier SONIC (MLP architecture, ~100 million frames), AstraBrain switches to a Transformer architecture and increases the data scale by an order of magnitude, claiming superiority in tracking precision and generalization. It should be noted that claims such as "world first" and "surpasses SONIC" are vendor-reported and have not been independently replicated; current evidence consists primarily of demos and internal metrics, with no open weights released.Source: Pedaily source

South Korea Pushes "Physical AI": LG CNS and Doosan Form Broad Alliance; Physical AI Alliance Launches Phase 2 · adjacent ⚠️ Strategic statements

LG CNS and Doosan have signed a broad technology alliance covering AI, robotics, and data centers; on the same day, South Korea officially launched Phase 2 of its "Physical AI Alliance," with the deputy prime minister stating that South Korea "must win in the global physical AI competition," with the focus shifting to execution and building a full domestic stack. These are primarily strategic and cooperative framework-level statements; specific milestones have yet to materialize.Source: 조선일보 source

ABB Robotics Partners with PSYONIC to Advance AI-Powered Dexterous Manipulation · embodied

ABB Robotics has announced a partnership with bionic prosthetics and dexterous hand company PSYONIC to advance AI-driven robotic dexterous manipulation using "human-inspired" data, extending the reach of an industrial robot arm manufacturer further into dexterous end-effectors.Source: Machine Maker source

RLWRLD Launches Data-Driven Evaluation Platform for Dexterous Hands · embodied

South Korean company RLWRLD has released a data-driven evaluation platform for dexterous robotic hands, providing a cross-product benchmarking tool for the rapidly growing and spec-diverse landscape of dexterous hand products.Source: TipRanks source

Hardware · Supply Chain

· Dexterous Hand Shipments: Industry analysts project dexterous hand shipments may reach 70,200 units in 2026, with demand rising alongside expectations of humanoid robot mass production source (WeChat, CN).

· Miniature Joint Modules: A miniature joint module company has completed a financing round of tens of millions of RMB led by Shenzhen Capital Group (Chinese state-backed VC), betting on small-form-factor joint modules for humanoid robots source.

· Shipment Structure: Industry analysis indicates that nearly 90% of humanoid robot shipments are in China, but the most expensive components — reducers and six-axis force sensors — are still primarily sourced from overseas source (WeChat, CN).

· Bionic Dexterous Hand: A team has released what it claims is the world's highest degree-of-freedom bionic dexterous hand at 38 DOF (vendor-reported), with a DOF count on par with and claimed to exceed Tesla's solution source (WeChat, CN).