Shawn

Posted on Jun 15

FutureX · Physical AI Daily — Issue 29 (06/16)

#ai #robotics #machinelearning #research

Today's Highlights

· Marine embodied intelligence becomes a new capital frontier: Shihang Intelligent closes an A-round exceeding 1 billion RMB, setting a global single-round funding record for marine robotics, with Zhu Xiaohu making his fifth consecutive investment and Temasek among new backers.

· World models continue to attract major funding: SenseTime-affiliated Daxiao Robotics accumulates hundreds of millions of dollars in the first half of the year and releases the home world model Kairos-HomeWorld, while GigaAI secures another 3.5 billion RMB over three months.

· Zhiyuan Yuanzheng A3 claims "fully autonomous" table tennis play against humans — no remote control, no scripting, no human intervention — a high-difficulty dynamic closed-loop capability demonstration (vendor claim).

· Humanoid robots debut across Chinese industry: SERES humanoid "Xiao Sai" makes its first appearance at a super factory; Songyan Dynamics releases its first open-source HarmonyOS consumer humanoid N2; Huawei puts humanoids on the HarmonyOS ecosystem.

· On the research side, "world model / world-action model" papers cluster: μ₀ replaces pixel prediction with 3D trajectory forecasting; Tencent Robotics X open-sources the full-stack VLA HyVLA-0.5.

I. Research Papers

μ₀: A Scalable 3D Interactive Trajectory World Model · world-model

Current world models follow two main paths, each with inefficiencies: pixel-space video models spend compute on dense appearance reconstruction, while direct action models require embodiment-specific action labels and are hard to scale. μ₀ offers a third route — predicting only the motion of the few points where interaction will occur.

Seungjae Lee et al. · arXiv 2606.13769 source · Commentary: SourceMind source (WeChat, CN)

μ₀ predicts neither dense pixels nor direct actions. Instead, it forecasts smooth 3D trajectories for key interaction points — objects, tools, hands, contact regions — forming a compact, embodiment-agnostic motion interface. The accompanying TraceExtract system automatically selects keypoints from diverse video sources and constructs 3D supervision signals, enabling training on heterogeneous video without action labels before transferring to specific robots.

Hunyuan Hy-Embodied-0.5-VLA: A Full-Stack System from VLA Model to Real-Robot Learning · vla

Rather than another benchmark-chasing VLA, this is Tencent Robotics X open-sourcing the entire pipeline — data collection, model, pretraining/fine-tuning, RL post-training, and real-robot deployment — making its engineering value greater than any single metric.

He Zhang et al. (Tencent Robotics X) · arXiv 2606.14409 source · Commentary: Jiqizhixin source (WeChat, CN) · HF↑6

The paper covers every stage of the full robot learning stack. On the data side, it uses sub-millimeter fingertip UMI interfaces for collection, eliminating heavy leader-follower teleoperation. On the post-training side, it is the first to systematically introduce Proximalized Preference Optimization (PRO) into flow-matching-based VLA reinforcement training, directly leveraging real robot failure data, and claims near-100% success rates on real-robot tasks. The model and methods are open-sourced.

EQRL: Elastic Execution Scheduling for VLAs Based on Task Difficulty · vla

Existing VLAs apply fixed denoising steps and replanning cadences regardless of whether the current state involves free-space translation or contact alignment — spreading compute evenly across states of unequal difficulty. EQRL makes "how long to compute" a learnable decision.

Ge Wang et al. (Ising AI & CUHK-Shenzhen) · arXiv 2606.14375 source · Commentary: Embodied Intelligence Chat source (WeChat, CN)

EQRL uses a lightweight latent-schedule adapter to jointly select latent inputs, denoising budgets, and action chunk lengths without fine-tuning the underlying VLA. A trained critic gives the scheduler difficulty awareness — hard or contact-dense states get more compute and more frequent feedback; easy states use less inference and longer open-loop execution. The commentary reports an approximately 32% reduction in inference cost.

WAM4D: Fast 4D World-Action Model via Spatial Register Tokens · world-model

Most world-action models operate in 2D video or latent space. Predictions that "look plausible" lack 3D spatial constraints and the contact geometry of occluded regions, making them insufficient for precise manipulation. Yet forcing models to decode dense 4D geometry slows causal action generation. WAM4D aims to have it both ways.

Ying Li et al. · arXiv 2606.14048 source

WAM4D introduces lightweight "spatial register tokens" as training-time readout points for future depth, transferring 3D priors from pretrained geometric foundation models into a causal video-action model. This allows action predictions to carry 3D and contact geometry constraints without expensive dense geometric decoding, while maintaining fast inference.

ContactWorld: Key Ingredients for Visuo-Tactile World Models in Contact-Rich Manipulation · world-model

What representations actually support long-horizon planning in contact-rich tasks has lacked a systematic answer. This paper uses a new benchmark to empirically settle the question of which representation to choose.

Zhiyuan Zhang et al. · arXiv 2606.13877 source

The authors build a benchmark covering 12 categories of contact-rich tasks including insertion, disassembly, screw-tightening, and exploratory interaction, and systematically compare visuo-tactile world models. The conclusion: representations that are both "spatially structured" and "temporally continuous" plan most reliably. Point cloud observations raise average planning success rates significantly above wrist-camera views (20.7% and 22.0%), highlighting the value of structured geometric information for contact reasoning.

Output-Layer Regularization Eliminates the "Random-Seed Lottery" in Single-GPU VLA Fine-Tuning · vla

Same code, same data, only the random seed changes: run 13 times, 12 land stably at 91–94%, one quietly drops to 65.2% — a 29-percentage-point collapse with no errors and no warnings. This paper names the problem, localizes the cause, and offers a cheap fix that is highly practical for practitioners.

Jeffrin Sam, Dzmitry Tsetserukou (Skoltech) · arXiv 2606.13856 source

The authors call this phenomenon the "seed lottery" and trace its root cause to "output collapse": the action predictor learns to ignore its input and produce nearly identical actions. Weight-space methods such as L2 and EWC fail structurally here — they penalize weight changes, but collapse occurs along directions where weights barely shift. Switching to output-layer regularization eliminates the collapse.

DiPOD: Preventing Diffusion Policy "Drift" During RL Post-Training · manipulation

RL post-training is increasingly critical for improving diffusion policies, but existing diffusion policy gradient methods are often unstable and unreliable. This paper from Berkeley identifies the mechanism behind the instability and provides a simple, practical fix.

Haozhe Jiang et al. (UC Berkeley) · arXiv 2606.13795 source

The authors identify "dual drift": optimizing the variational surrogate objective causes the ELBO to diverge from the true log-likelihood, which in turn causes the surrogate policy gradient to deviate from the true reward policy gradient. DiPOD alternates between self-distillation and policy improvement updates during training, equivalent to adding an on-policy ELBO regularization term to each diffusion policy gradient step, maintaining tight bounds and stable improvement throughout.

RT-VLA: Real-Time Autonomous Driving VLA via Knowledge Distillation · autonomy

VLA end-to-end joint modeling of perception, language reasoning, and action is promising, but the inference latency of large vision-language backbones makes real-world deployment impractical. This paper uses distillation to compress the capability into a model that can run in real time.

Xiangyu Huang et al. (CMU) · arXiv 2606.14010 source

RT-VLA uses multi-level supervised distillation to transfer the driving and reasoning capabilities of the state-of-the-art driving model SimLingo into a compact student model. Post-hoc language analysis of safety-critical moments is performed offline to preserve interpretability without adding real-time control latency.

Other papers today: Multi-Agent Embodied Autonomous Driving (survey, unifying V2X cooperative driving under "shared world models," covering 380+ references); PhysVLA (physics-constraint plugin at inference time, wrapping any frozen VLA with <1ms per-step overhead); Spatially Conditioned Diffusion Policy (precise and robust manipulation from a single RGB camera); Universal Manipulation Exoskeleton (upper-limb teleoperation collection with real-time torque feedback); EgoGuide (first-person demonstration collection without a robot); GAIT (legged robot proprioceptive state estimation with inertial-leg token attention); Robust Fall Recovery (force-guided fall recovery for armless bipedal wheeled robots); Semidefinite Relaxations for Collision-Free Motion Planning (Russ Tedrake et al., theoretical analysis of semidefinite relaxations for collision-free motion planning); ReactVLA (lightweight low-latency reactive manipulation with improved Mean Flow).

Open source · tools · benchmarks: ORCA (open-source dexterous hand research full stack integrating control/simulation/teleoperation/retargeting into the robot learning ecosystem, arXiv 2606.14561 source); Kine2Go (Unitree Go2 multi-gait kinematics dataset, arXiv 2606.14433 source); ContactWorld (the visuo-tactile world model benchmark above, 12 categories of contact-rich tasks).

II. Funding & Deals

Shihang Intelligent | Series A | Over 1 billion RMB · embodied

A Suzhou-based marine embodied intelligence company with a fully in-house stack covering propulsion, control, sensing, navigation, sealing, and deployment across six systems, focused on complex underwater environments. New investors include Moore Threads and Kunlun Chip-backed Shanghe Momentum Fund, Temasek's Vertex Growth, CITIC Group's agricultural industry fund, Yuzun Capital, and listed company Dayang Electric, with existing investor GGV Capital and others following on. Zhu Xiaohu has now invested in the company five consecutive rounds. The company claims this is the largest single-round raise in global marine robotics, with first-half orders exceeding 1 billion RMB. Ocean is a "long tail of the physical world" that capital has largely overlooked — this round opens a new front beyond the crowded land-based humanoid space.Source: IPO Early Notice source (WeChat, CN)

Daxiao Robotics | Angel+ Round | Hundreds of millions USD (H1 cumulative) · world-model

An embodied intelligence company under SenseTime, led by co-founder Wang Xiaogang, positioned as a supplier of robot "brains" and intelligent core components. Its flagship product is the Kairos world model 3.0; in collaboration with CUHK and the Shenzhen River Loop Academy, it has released Kairos-HomeWorld, a world model framework for whole-home generation and full object interaction. This round brings in Ant Group, Geely Capital, Dachen Caizhi, Shenzhen Capital Group, Qiming Venture Partners, SenseTime Guoxiang, MooreThreads, and Lenovo Ventures, among other industrial and financial backers. Founded in July 2025, the company has raised hundreds of millions of dollars in cumulative H1 funding and is described as one of the embodied AI sector's "fastest unicorns."Source: ZhangTong Society source (WeChat, CN)

GigaAI | New Round | 1 billion RMB (approx. 3.5 billion RMB cumulative over three months) · world-model

Founder Huang Guan argues that "physical AGI will act directly on the real physical world"; the company enters through world models and video generation. Three consecutive funding rounds over three months total approximately 3.5 billion RMB, with top-tier global capital concentrating its bets — another data point showing that world model momentum is flowing from papers and conferences into the primary market.Source: Zhidongxi source

Xuanji Dynamics | Strategic Investment | Amount undisclosed · industrial ⚠️ Single-party claim

An embodied intelligence robotics company that has received strategic investment from SAIC Group, Dongfang Precision, and others. Following several automakers' cross-sector moves, OEM and industrial capital continue to stake positions in the embodied supply chain via equity stakes. The amount and ownership percentages have not been fully disclosed.Source: Guandian.cn source

Noitom Robotics | Pre-A++ Round | Amount undisclosed · adjacent

Positioned as "a robotics company that doesn't build robots," Noitom provides data infrastructure for embodied intelligence and humanoid robots; ModalityNet has gone live and the company continues advancing embodied data industrialization. It says its next round will open soon. Data collection and labeling are becoming the next capital-intensive layer after hardware.Source: Jinxiu Science Park source (WeChat, CN)

MW (Japan) | Seed Round | $21 million · adjacent

Japanese startup MW has closed a $21 million seed round to build "homes designed for physical AI" — embedding robot accessibility, sensing, and interaction capabilities into residential spaces from the ground up. Modifying the home before deploying the robot is an uncommon but pragmatic approach to the household scenario.Source: AI Insider source

III. Commercial Deployment

Middle East's First Hydrogen-Powered Autonomous Heavy Truck Enters Operation · autonomy

A hydrogen-powered autonomous heavy truck developed with the participation of Refire Energy has entered actual operation in the Middle East, combining autonomous driving and hydrogen propulsion for highway and port freight. The Middle East has recently become a significant destination for Chinese autonomous heavy trucks and Robotaxi services expanding internationally.Source: China Hydrogen Energy Industry Promotion Association source

Chinese Greenhouse Tomato Harvesting Robot Enters European Trial · embodied

A Chinese greenhouse tomato harvesting robot has secured a European trial, advancing visual grasping from laboratory demonstration to field validation in a real cultivation environment. Agricultural harvesting is one of the few manipulation robot segments with meaningful existing willingness to pay at scale; an overseas trial is a key step toward commercialization.Source: Hortidaily source

Marine AI Welding Robot Completes 30-Tonne Component Operations · industrial

Xinhua reports that an AI robot in China completed autonomous welding of 30-tonne large components in an offshore engineering project, supporting a range of inshore-to-offshore underwater and surface operations. This echoes today's Shihang Intelligent funding news: marine manufacturing and marine embodied intelligence are simultaneously attracting industrial and capital attention.Source: Xinhua source

JD.com Partners with Two Platforms in One Month, Doubling Down on Robot Leasing · adjacent

JD.com has partnered with two robot platforms within a single month, expanding its robot leasing business. At a stage when hardware prices remain high and corporate procurement is cautious, leasing and as-a-service models are a key commercial lever for lowering deployment barriers and accelerating real-world adoption at scale.Source: Sohu source

IV. Industry Developments

Zhiyuan Yuanzheng A3 Claims "Fully Autonomous" Table Tennis Against Humans · embodied ⚠️ Vendor claim

Zhiyuan Robotics says its full-size bipedal humanoid Yuanzheng A3 has played table tennis against a human with no remote control, no scripting, and no human intervention, self-reporting a hit rate of approximately 91% and claiming to be "the world's first" full-size bipedal humanoid to achieve this. Table tennis demands near-human real-time performance across the perception-decision-execution loop, making it a meaningful demonstration of capability in dynamic unstructured environments. However, "world's first" and "91%" are unverified vendor claims from a single demonstration, and demonstration capability is separate from production readiness or scalable deployment. Sony's earlier AI table tennis robot serves as a comparable prior example.Source: Jiemian News source (WeChat, CN)

SERES In-House Humanoid "Xiao Sai" Debuts at Super Factory · humanoid

SERES Group Vice President Kang Bo released a video unveiling the company's in-house humanoid robot "Xiao Sai," which performed visual recognition and voice interaction as a tour guide inside a super factory (accompanying actor Huang Bo on a visit). The company also disclosed that multiple logistics and quality-inspection robots for factory scenarios are already deployed on production lines, with plans to release additional bipedal, quadruped, and other embodied robots later this year. Automakers' existing supply chains and ready-made deployment venues give them a natural advantage in entering the humanoid space; SERES's entry further enlarges the field of cross-sector automotive players.Source: Embodied Intelligence HQ source (WeChat, CN)

Songyan Dynamics Releases First Open-Source HarmonyOS Consumer Humanoid N2, Launches "100 People, 100 Robots" Program · humanoid

Songyan Dynamics has released N2, described as the industry's first consumer humanoid robot integrated with open-source HarmonyOS, alongside a "100 People, 100 Robots" developer co-creation program — selecting 100 developers to receive free robots. Open-source HarmonyOS integration means voice interaction can connect with air conditioners and other smart devices across multiple platforms. Pairing a consumer humanoid with an open operating system ecosystem is a move to lower development barriers and compete for developer mindshare.Source: Phoenix Online source

Huawei HDC2026: Humanoid Robots Run on HarmonyOS, Cross-Platform Device Integration Demonstrated · embodied

At Huawei Developer Conference 2026, robots connected to open-source HarmonyOS — including robot dogs and humanoids — appeared on stage, demonstrating the ability to use robots as entry points that control home and office devices. Huawei's strategy is to hold the "foundation layer" of embodied intelligence through its operating system and ecosystem position, rather than building complete robots itself.Source: Sina Finance source

Li Auto Livis Day: Mach VLA Targets Tesla FSD V14 Parity in Q4 · autonomy ⚠️ Vendor claim

Li Auto held its Livis Day software and embodied intelligence launch event, announcing comprehensive upgrades across software and embodied AI. The head of autonomous driving said the in-house Mach VLA continues to evolve, with a stated goal of matching Tesla FSD V14 in Q4. "Matching FSD V14" is a public self-benchmark; no third-party evaluation on the same criteria exists, and real-world performance remains to be validated by Q4 vehicle testing.Source: Sina Finance source

Parallel Systems Advances "World's First Autonomous Freight Rail System" · autonomy

Parallel Systems, founded by former SpaceX engineer Matt Soule, is building electric autonomous freight cars with no locomotive and no on-board driver, compatible with existing freight and train control software and capable of autonomous coupling and decoupling. The company has raised over $100 million in total, holds 300+ vehicle orders, and has received Federal Railroad Administration (FRA) approval to conduct the first autonomous freight rail system test, targeting initial commercial operations in 2026. Shifting short-haul freight from road to automated rail is a distinct route in freight autonomy, separate from the trucking path.Source: Robotics & Automation News source

"DeepMind Partner" Dishwashing Humanoid Video Admitted by Founder to Be Fake · humanoid

A video appearing to show a humanoid robot doing kitchen chores went viral before being confirmed as an AI-generated promotional film. Fabian Kerj, founder of Qualia — which is part of Google DeepMind's European robotics partnership program — admitted it was not a real robot, saying "we build training infrastructure, not hardware," adding "but it got your attention." The incident once again puts the authenticity of humanoid robot demonstrations and marketing honesty under scrutiny — earlier cases of humans posing as robots were widely cited in response.Source: The Cool Down source

Yinghe Robotics Reported Near Collapse; Panda Capital Applies Public Pressure · industrial ⚠️ Single-party claim

Yinghe Robotics, which has raised approximately 600 million RMB, is reported to be in serious operational difficulty, with investor Panda Capital publicly directing blame at parties affiliated with Midea. Specific operational details and the dispute are based on a single party's account and await responses from other parties. Even as primary-market capital floods in, exit and governance risks for early-stage projects are beginning to surface — the other side of a sector experiencing simultaneous boom and stress.Source: Sina Finance source

Hardware & supply chain: Competition to mass-produce dexterous hands is intensifying — Yinshi Robotics claims it delivered over 10,000 dexterous hands in 2025, calling itself the first company globally to exceed 10,000 annual units with over 60% market share (vendor claim); LinkerBot says its single-month peak output exceeds 4,000 units across tendon-driven, linkage, and direct-drive product lines; ICRA 2026 saw a surge of new dexterous hand entrants and products, with component supply becoming more predictable than the hardware body itself. Upstream precision reducer substitution in China is entering a "golden window" — performance approaching Japanese leaders at significantly lower cost — widely viewed as one of the key bottlenecks to cost reduction at humanoid scale.

DEV Community

FutureX · Physical AI Daily — Issue 29 (06/16)

I. Research Papers

II. Funding & Deals

III. Commercial Deployment

IV. Industry Developments

Top comments (0)