Shawn

Posted on Jun 23

FutureX · Physical AI Daily — Issue 37 (06/24)

#ai #robotics #machinelearning #research

Today's Highlights

· Kunlunxing Robotics (Chinese humanoid startup), co-founded by former Alibaba Group VP Ren Geng and former Li Auto (Chinese EV maker) autonomous driving chief Lang Xianpeng, completed 3 funding rounds in under 90 days from incorporation, raising billions of RMB in total and surpassing a $1 billion valuation — setting a new record for the fastest 0-to-unicorn trajectory among Chinese embodied AI startups.

· Momenta (Chinese autonomous driving company) passed its Hong Kong Stock Exchange listing hearing, targeting a fundraise of no less than $1 billion USD in what could become the first "Physical AI" IPO; its prospectus shows license revenue growing over 42x in three years (¥23M → ¥968M).

· Zhiyuan Robotics (Chinese humanoid maker) launched a 6-day factory livestream for its G2 robot, claiming it is the first globally to have a humanoid cover an entire 3C tablet quality-inspection line end-to-end (⚠️ livestream demonstration, not mass production).

· Real orders and deliveries landed on the same day: Galaxy General Robotics (Chinese embodied AI company) won a ¥236M tender for 500 embodied robots in Yibin; 120 autonomous mining trucks equipped with Yuchai (Chinese engine maker) flywheel range-extender systems were delivered to a coal mine in Xinjiang.

· Investment in embodied AI data infrastructure continues unabated: Lightwheel AI (Chinese physical AI data company) raised another ¥1 billion, accumulating roughly ¥2 billion across two rounds in two weeks, accelerating construction of physical AI data and evaluation infrastructure.

I. Research Papers

Vesta: A General-Purpose Embodied Reasoning Model · vla

Packing localization, spatial reasoning, navigation, and long-horizon planning into a single foundation model targets the longstanding problem of "stacked specialist models" — expensive at deployment and prone to compounding errors. This is NVIDIA's latest bet on "one model does everything," following GR00T.

Johan Bjorck et al. (NVIDIA) · arXiv 2606.20905 https://arxiv.org/abs/2606.20905

The approach has two components: a scaled, carefully constructed corpus to induce spatial grounding, and a lightweight multimodal memory harness supporting reasoning across extended time horizons. The authors report that Vesta outperforms individual single-task state-of-the-art baselines by an average of over 20% across multiple benchmarks, and exceeds an ensemble of per-category best models by more than 10%, arguing that a single generalist model can match or surpass combinations of specialized models.

OpenHLM: A Recipe for Whole-Body Humanoid Loco-Manipulation · locomotion

Most existing humanoid systems split upper and lower body into two separate controllers, producing behavior that degrades into "a walking dual-arm cart." This paper asks directly: what does it take to build a whole-body-native VLA that maps language and pixels to all degrees of freedom simultaneously?

Yingdong Hu et al. · arXiv 2606.22174 https://arxiv.org/abs/2606.22174

The study is organized as an "change one variable at a time" experimental roadmap spanning three stages: whole-body teleoperation, VLA model design, and heterogeneous data co-training. One finding is that a joint-space-based whole-body teleoperation interface outperforms other teleoperation schemes — providing a reproducible engineering baseline for data collection and training of whole-body policies.

SafeDojo: Safe Reinforcement Learning for VLAs via Interactive World Models · vla

Prior safe RL approaches either rely on expensive real-world trial and error or hand-crafted safety functions — neither scales to VLAs operating in open physical environments. SafeDojo claims to be the first model-based safe RL framework targeting VLA policies.

Kai Tang et al. · arXiv 2606.20698 https://arxiv.org/abs/2606.20698

The approach performs online reinforcement learning on top of an interactive video world model: the world model generates action-conditioned future predictions, and a dedicated ResNet success classifier estimates success/risk at each step, allowing the policy to learn safe behaviors in "imagination" without repeatedly attempting dangerous actions on a physical robot.

MemoryVAM: Equipping Video Action Models with Episodic Memory · manipulation

Video world model policies observe only a short window; once the correct action depends on events that have scrolled out of frame, long-horizon manipulation degrades into a non-Markovian problem. This paper adds the ability to "remember what just happened" to such policies.

Yuxin Jiang et al. · arXiv 2606.20679 https://arxiv.org/abs/2606.20679

The core is a Recap-Cue module: a Perceiver-based Recap Compressor compresses per-frame CLIP embeddings into compact memory tokens, while a lightweight Cue Gate combines memory and language to estimate task completion; these tokens are injected into both the video backbone and action decoder, aligning the policy's "imagination" to task progress and conditioning actions on history.

Geometric Entropy: When Trajectory Diversity Helps — and Hurts — Imitation Learning · manipulation

"More diverse demonstrations are better" is a common intuition in imitation learning; this paper half-disproves it with a quantifiable metric — diversity has an optimal range, and too much hurts performance.

Qian Luo et al. · arXiv 2606.20871 https://arxiv.org/abs/2606.20871

The authors propose Geometric Entropy (H_G), a task-agnostic metric that quantifies intrinsic trajectory shape diversity after aligning to target poses and workspace scale. Across multiple imitation learning architectures, simulated and real contact-rich tasks, success rate shows a consistent inverted-U relationship with H_G: low diversity benefits from more variety, but once diversity is high enough to cause "policy ambiguity," performance drops; and as data grows and tasks become more practiced, the optimal entropy shifts toward lower values.

Inverting the Bellman Equation: Reading World Models Out of Q-Values · world-model

Model-based and model-free RL have long been treated as two separate paths; this paper theoretically unifies them, proving that a value-based agent trained on a sufficiently rich set of rewards implicitly encodes a unique and accurate world model.

Alistair Letcher et al. (incl. Jakob Foerster) · arXiv 2606.21173 https://arxiv.org/abs/2606.21173

The authors propose P-learning — the "inverse operation" of Q-learning — which decodes an agent's internal environment model by sampling its Q-values, policy, and rewards; they also derive sufficient conditions on reward type and quantity under which an agent encodes the true transition kernel P. This provides a formal answer to the question of how much environmental knowledge is hidden inside a value function.

MAGNIFIED: RL Fine-Tuning of Multimodal Large Models for Autonomous Driving Motion Planning · autonomy

Multimodal large models excel at semantic understanding, but the "next-token prediction" used in pretraining and supervised fine-tuning only encourages word-by-word text imitation — often ignoring multi-step consequences or space for other road users, misaligning with planning objectives.

Letian Chen et al. · arXiv 2606.20641 https://arxiv.org/abs/2606.20641

MAGNIFIED proposes a reinforcement learning fine-tuning (RLFT) scheme that directly aligns multimodal-model-based driving decisions to planning objectives rather than remaining at token-level imitation, making the model more consistent in intent and multi-step safety outcomes.

Tactile Genesis: Large-Scale Exploration of Tactile Sensors for Learning Dexterous Tasks · perception

Tactile sensing is critical for contact-rich dexterous manipulation, but "which tactile abstraction a policy actually needs, and when richer tactile fields justify hardware cost" is nearly impossible to study empirically — swapping sensors is roughly equivalent to swapping robots, and no lab can replicate the same learning experiments across all sensors.

Trinity Chung et al. · arXiv 2606.22332 https://arxiv.org/abs/2606.22332

This is a GPU-parallel tactile sensor simulation platform that exposes, through a unified interface, binary contact, contact depth, per-taxel force/torque, elastomer marker displacement, geometric proximity, contact audio, and a voxelized temperature field (the first in a robot learning physics simulation platform) — with configurable layouts, resolutions, and realistic noise models including drift and hysteresis — enabling systematic comparison of "which tactile modality is worth building" for the first time.

Other papers today: World Action Models: A Survey (HF↑33, clarifying the boundaries between world models, video generation, action-grounded video world models, VLAs, and "world-action models," with a unified taxonomy); MV-WAM (manifold-aware world-action model + value augmentation for improved out-of-distribution manipulation generalization); Wh0 (using a generative world model to produce a 50,000-clip first-person human hand manipulation video dataset WM-H); Foresight (HF↑8, using action-conditioned world model latents for long-horizon manipulation failure detection); large-scale parallel sampling MPC deployed on real hardware (JAX + MuJoCo MJX, closing the real-sim-real loop on a Franka with Push-T); PolicyTrim (HF↑4, improving VLA intrinsic policy efficiency beyond reducing per-step latency); PoLAR (HF↑7, introducing a polar-coordinate radius-direction structure to latent actions).

Open Source · Tools · Benchmarks

· R2HandoverSim: A simulation benchmark for robot-to-human object handover, systematically comparing 4 baselines with a 30-person user study and proposing 5 complementary metrics (reachability, grasp stability, safety, etc.) claimed to better reflect user perception than single success rate alone. Includes code and project website. arXiv 2606.21011 https://arxiv.org/abs/2606.21011

· LIBERO-Safety: A comprehensive physical and semantic safety evaluation benchmark for VLAs. arXiv 2606.23686 https://arxiv.org/abs/2606.23686

· Humanoid-OmniOcc: A stereo omnidirectional occupancy dataset for embodied AI. arXiv 2606.22971 https://arxiv.org/abs/2606.22971

· AutoDex: An automated real-world dexterous grasping data collection system. arXiv 2606.23689 https://arxiv.org/abs/2606.23689

II. Funding & Deals

Kunlunxing Robotics ｜ 3 Rounds in 90 Days ｜ Cumulative Billions of RMB ｜ Valuation Exceeds $1B USD · humanoid

Founded by former Alibaba Group VP and Alibaba Cloud China president Ren Geng, with former Li Auto autonomous driving lead Lang Xianpeng as co-founder, the company was incorporated on March 16, 2026 and reached unicorn status in under 90 days across 3 rounds. Investors include Gaorong Capital, Hillhouse Venture, CASSTAR, Zhongding Capital, Sinovation Ventures, Xin Capital, and Jianfa Capital, with first-round investors fully doubling down in subsequent rounds. The company benchmarks against Tesla Optimus, pursuing a dual-track "hardware body + AI brain" strategy. This is another extreme example of the "star team + top-tier institutions" playbook in the embodied AI space this year — with the 0-to-1 speed record pushing FOMO in primary markets to new highs. Source: 36Kr source

Momenta ｜ Hong Kong IPO (Listing Hearing Passed) ｜ Targeting ≥$1B USD ｜ Valuation Expected to Exceed ¥100B · autonomy

Momenta published its post-hearing information pack and passed the Hong Kong Stock Exchange listing hearing on June 23, with CICC and Deutsche Bank as joint sponsors, positioning itself to become the first "Physical AI" public listing. The prospectus shows revenue growing from ¥743M to ¥2.413B over 2023–2025 — a three-year tripling at a compound annual growth rate exceeding 80% — with license revenue surging from ¥23M to ¥968M, over 42x growth. The company claims approximately 65% market share in third-party urban NOA (Navigate on Autopilot) sales volume and has signed nine of the world's top ten automakers. Its R7 world model entered mass production in April 2026, serving as the technical anchor for its "Physical AI" narrative. Source: Yicai source

Lightwheel AI ｜ New Round ｜ ¥1 Billion ｜ Cumulative ~¥2 Billion in Two Weeks · adjacent

China Renaissance acted as exclusive financial advisor, with Giant Network participating in the round; proceeds will fund construction of physical AI data and evaluation infrastructure. Following a string of closings at "data is the moat" companies such as Inverse Matrix and Mifeng, Lightwheel's two rounds totaling ~¥2 billion in two weeks further establishes "embodied data / evaluation infrastructure" as a standalone, high-valuation segment. Source: China Renaissance source

Zhengxing Innovation ｜ Angel Round ｜ ~$100M USD · embodied

Multiple listed companies including Charoen Pokphand Group, Huaqin Technology, and Nine Medical jointly invested; the company focuses on physical intelligence scenarios such as retail shelving and advocates a "co-evolution" technical path toward physical intelligence. Raising nearly $100M at angel stage, backed by multiple industrial listed companies, reflects growing interest from scenario operators in embodied solutions capable of operating in real retail and manufacturing environments. Source: Securities Daily http://m.zqrb.cn/gscy/qiyexinxi/2026-06-23/A1782211019099.html

Krafton ｜ Strategic Investment ｜ ~$33M USD · hardware

South Korean gaming giant Krafton has bet on an AI chip startup as part of its "Physical AI" investment strategy. A gaming company entering physical AI via compute and chips is another signal of Asian tech companies crossing into the embodied space. Source: KED Global https://www.kedglobal.com/korean-games/newsView/ked202606230004

Oversonic Robotics ｜ Strategic Investment · humanoid

STMicroelectronics, Fondazione ENEA Tech Biomedical, and SpotInvest jointly took a stake in this Italian humanoid robotics company, with a major semiconductor manufacturer binding itself to a robot hardware maker through equity. Source: PR Newswire https://www.prnewswire.com/news-releases/oversonic-robotics-stmicroelectronics-fondazione-enea-tech-biomedical-and-spotinvest-acquire-a-stake-in-the-company-302806639.html

Other funding: Shihang Intelligence closed what it claims is the "world's largest marine robotics financing" (marine robots, amount not explicitly disclosed, source); a focused ultrasound ablation surgical robotics company raised a new round at a ¥25.4B valuation (source); AI kitchen robotics brand "Lishang" (Chinese kitchen robot startup) raised tens of millions of RMB in a Series A (source); magnesium alloy materials company "Yimeihua" raised tens of millions of RMB at angel stage targeting robot lightweighting (https://m.sohu.com/a/1040545217_114778?scm=10001.325_13-325_13.0.0-0-0-0-0.5_1334).

III. Commercial Deployment

Zhiyuan G2 Launches 6-Day Factory Livestream, Claims Full Coverage of 3C Tablet Quality-Inspection Line · industrial ⚠️ Livestream demonstration scope

Zhiyuan Robotics announced that its G2 humanoid robot has entered a mass production line for a continuous 6-day transparent livestream covering all process steps on a 3C tablet quality-inspection line, emphasizing long-term stability, full-process compatibility, and low-cost replication as "deployment-state" metrics — claiming this is the first globally to have a humanoid cover an entire 3C quality-inspection line end-to-end. It should be noted that this is a 6-day on-site cluster test and livestream, constituting a capability demonstration rather than scaled mass-production data; the true "deployment-state" quality will depend on utilization rates and yield rates under routine production conditions. On the same day, Zhiyuan took a stake in Xinqi Robotics (Chinese robot actuator startup), a specialist in core actuation components, continuing to build out its upstream supply chain. Source: Jushenpai source

Galaxy General Robotics Wins ¥236M Tender for 500 Embodied Robots in Yibin · embodied

Galaxy General Robotics (Chinese embodied AI company) was ranked first candidate at approximately ¥235.92M, the highest single-tender candidate amount for embodied robots this year. The buyer is a state-owned joint venture platform in Yibin (Yibin Urban Investment holding 65%, Wuliangye Group's new energy platform holding 35%), covering 500 units and support: 380 wheeled robots, 80 heavy-load robots, 10 robot dogs, and 30 robotic retail pods. This is a rare genuine bulk order from "state capital + industry player" buyers — both the scale and the buyer profile carry more commercial weight than vendor demos. Source: Zhidongxi https://finance.sina.com.cn/wm/2026-06-23/doc-iniekqut6490185.shtml

120 Autonomous Mining Trucks Delivered to Xinjiang Coal Mine · autonomy

120 autonomous mining trucks equipped with Yuchai flywheel range-extender systems were delivered to a coal mine in Xinjiang, China, entering actual commercial operation. Mining is one of the earliest closed-loop environments where autonomous driving has proven commercial viability; a fleet delivery of 100+ units represents genuine scaled deployment, not a pilot. Source: Sina https://k.sina.com.cn/article_5952915720_162d24908067047y36.html?from=auto

Unitree G1 Enters Japanese Market via GMO AIR · humanoid

Unitree Robotics (Chinese humanoid maker) has partnered with Japan's GMO AIR to sell its G1 and other humanoid robots through GMO's distribution channel in Japan. This marks a significant step toward scaled entry into the Japanese market via a local distributor — closer to real orders than standalone overseas showcases. Source: Chanye Shendu source

Robust.AI's Third-Generation Carter Robot Adopts Aptiv Perception Solution · industrial

Warehouse logistics robotics company Robust.AI has selected Aptiv's PULSE-based perception system for its third-generation Carter robot, bringing automotive-grade perception supply chains into logistics robot hardware. Source: Business Wire https://www.businesswire.com/news/home/20260623719695/en/Robust.AI-Selects-Aptiv-Perception-Powered-by-PULSE-for-its-Gen-3-Carter-Robot

LEM Surgical Humanoid Surgical Robot Receives Second FDA 510(k) Clearance · embodied

LEM Surgical announced that its next-generation humanoid surgical robot system has received a second FDA 510(k) clearance. Regulatory approval is the hard threshold for medical robots to enter clinical use; obtaining two clearances signals that its product line has crossed the compliance threshold for commercialization. Source: MassDevice https://www.massdevice.com/lem-surgical-new-fda-clearance-surgical-robot/

Other deployments: Amazon Zoox begins robotaxi testing in Dallas this month (https://www.audacy.com/jackontheweb/latest/zoox-amazons-robotaxi-unit-to-begin-testing-in-dallas-this-month); BHP pilots electric autonomous haul trucks at its Jimblebar mine in the Pilbara region of Western Australia (https://discoveryalert.com.au/bhp-electric-haul-truck-trial-jimblebar-pilbara-2026/); WeRide (Chinese autonomous driving company) partners with Geely and Kwoon Chung to bring robotaxi to Hong Kong's right-hand-drive market (source).

IV. Industry Developments

Morgan Stanley Sharply Raises China Humanoid Robot Shipment Forecast, Calls Industry "Early Commercialization" · humanoid

Morgan Stanley has raised its shipment forecasts for humanoid robots in China, judging that the industry has entered an "early commercialization" phase. Combined with Zhiyuan's production line livestream and Galaxy General Robotics' bulk tender win covered in this edition, sell-side analysis and industry events are converging for the first time at the "from demo to shipment" inflection point. Source: Sohu https://m.sohu.com/a/1040611611_130887?scm=10001.325_13-325_13.0.0-0-0-0-0.5_1334

NVIDIA Halos Robot Safety System Goes Live with BlackBerry QNX as Safety Layer · world-model

Following up on yesterday's report: further details have emerged on the ecosystem around NVIDIA's full-stack physical AI safety architecture Halos — the safety layer is underpinned by BlackBerry QNX, with Agility Robotics as the first integrator; its Outside-In blueprint feeds factory perimeter camera data into the robot's decision loop to dynamically adjust behavior. This systematically extends "functional safety" from the automotive domain to humanoid robots entering factory environments. Source: blockchain.news https://blockchain.news/flashnews/blackberry-qnx-powers-nvidia-robot-safety-layer

Competition for "Robot Cerebellum" Heats Up: Naver and a Beijing Company Each Release Motion Foundation Models · world-model

South Korea's Naver released a "robot brain" emphasizing light weight and speed; a Beijing-based company claims to have released "the world's first general-purpose humanoid robot motion cerebellum GPT foundation model." The motion control "general cerebellum" layer is becoming the next foundation model battleground after VLAs and world models — though claims such as "world's first" are vendor-issued and actual generalization capabilities still require third-party validation. Source: Seoul Economic Daily https://en.sedaily.com/technology/2026/06/23/naver-unveils-lightweight-fast-robot-brain · Beijing Municipal Commission of Economy and Information Technology source

Honda Demonstrates AI-Controlled Hand-Like Robot · embodied

According to NHK, Honda has developed an AI-controlled hand-like robot. The continued in-house investment by a Japanese automaker in dexterous manipulation hardware is another move by a legacy manufacturing giant into the embodied AI space. Source: nhk.or.jp https://www3.nhk.or.jp/nhkworld/en/news/20260623_B2/

Geek+ Plans Share Buyback of Up to HK$2 Billion · industrial

Warehouse robotics company Geek+ announced plans to repurchase up to HK$2 billion of its shares, sending a clear signal on post-listing share price and capital allocation. Source: Beijing News https://www.bjnews.com.cn/detail/1782198761129122.html

SERES (Chinese automaker) Humanoid Robot Makes Public Debut · humanoid

SERES unveiled its first humanoid robot and stated it will be used in its "super factory" producing AITO-brand vehicles. An automaker deploying self-developed humanoids on its own production line continues the trend of OEMs entering the hardware space directly — though this remains a product unveiling with no mass-production or deployment data yet. Source: 5plus2 Robotics source

Hardware · Supply Chain

· Dexterous Hand Price War: Multiple reports converge on the same narrative — overseas complete-hand prices remain in the millions of RMB range, while Chinese-made units have been driven down to roughly ¥50,000, with vendors targeting ¥500 within three years; at the same time, financial results among listed Chinese dexterous hand companies are diverging sharply. The low pricing reflects the advantages of China's vertically integrated supply chain, though the "millions → ¥50K → ¥500" trajectory largely represents vendor and media narrative; actual producible cost and yield remain to be verified by financial disclosures. (source / https://news.china.com/socialgd/10000169/20260623/49565160.html)

· Harmonic Reducers: Chinese harmonic reducer manufacturers including Lände Harmonic are reported to be "trapped in a price war," with upstream core component margins being steadily squeezed as robot system prices fall. (https://robot.ofweek.com/2026-06/ART-8321201-8420-30691736.html)

· New Materials: Zhonglan Chenguang (Chinese materials company) launched a new material solution rated for 250 kg load targeting dexterous hands, reported on China Central Television's evening news broadcast; Yimeihua is entering robot lightweighting with high-thermal-conductivity, high-strength magnesium alloys. (source)

· Core Motors: Jingrui Chang (Chinese motor maker) released a φ10 high-power-density hollow-cup motor specifically designed for dexterous hands, targeting the demand for miniaturized high-density motors in multi-fingered dexterous hand applications. (source)