Shawn

Posted on Jul 1

FutureX · Physical AI Daily — Issue 45 (07/02)

#ai #robotics #machinelearning #research

Today's Highlights

· Japan establishes a national strategy for sovereign physical AI: up to ¥1 trillion (~$6.1 billion) in R&D funding over five years for a homegrown "physical AI" foundation model, targeting the deployment of 10 million robots across 18 industries including food service and healthcare by 2040, led by the SoftBank- and Sony-backed alliance Noetra.

· China State Taxation Administration data: sales revenue growth for embodied intelligence companies rose to 22.4% in the first five months of this year, while industrial enterprises' spending on embodied robots purchases increased 2.3x year-on-year, signaling the industry's move into scaled volume growth.

· The largest single angel round yet in the consumer embodied robotics space: DISCOVER Robotics (Qiuzhi Technology) — founded by Zhou Guyue, a former core member of DJI — raised over $100 million in its first funding round, with Legend Capital and Lenovo Capital among the investors.

· Nvidia's GEAR lab open-sources ASPIRE, a self-improving skill discovery system that uses an "execute–diagnose–repair–accumulate" closed loop to raise dual-arm handover success rates from 20% to 92%, while building up a reusable skill library.

· AVATR (Chinese automaker) obtains a China L3 autonomous driving test license, launching multi-scenario road tests on highways and expressways in Chongqing — another step forward for China's advanced autonomous driving governance framework.

I. Research Progress

AdaJEPA: Letting World Models Keep Evolving During Action · world-model

Latent-space world models are typically frozen after training, so once test-time distribution shifts degrade their predictions, planning fails along with them. AdaJEPA embeds test-time adaptation of the world model directly into an MPC control loop, requiring just one gradient step per replanning cycle to significantly boost planning success rate — a representative work bringing "world models that keep learning while in use" into closed-loop control. Authors include Yann LeCun and Mengye Ren.

Ying Wang et al. (NYU, et al.) · arXiv 2606.32026 source · Commentary: AIer Notes source (WeChat, CN)

MemLearner: Giving Video World Models "Contextual Memory" · world-model

Video world models lack memory, causing scene drift and inconsistency during long-horizon generation. MemLearner teaches the model to actively query contextual memory, directly reusing pretrained visual priors without training extra modules from scratch — ranking among today's top trending papers in the community (HF↑18).

Jiwen Yu et al. · arXiv 2606.31734 source

Position Paper: VLA's "Physical Reasoning" Cannot Be Verified · vla

A position paper aimed at cooling the VLA hype: the industry commonly interprets gains on manipulation benchmarks as evidence that "internet-scale semantic representations have transferred to physical execution generalization," but the authors argue this assumption has never been independently verified and cannot be tested under current evaluation protocols, calling for a shift toward a verifiable research paradigm. A pointed take that cuts at a real methodological weakness in current VLA evaluation.

Taozhao Chen et al. (University of Sydney) · arXiv 2606.30686 source

From Grasps to Dexterity: Large-Scale Grasp Data Can Also Train "Functional Dexterous Manipulation" · manipulation

Large-scale grasp data has previously stopped at grasp generation and pick-and-place. This paper pushes it further into functional dexterous manipulation that requires operating an object's internal mechanism — such as aiming a spray bottle to water plants or operating a glue gun. The method uses hierarchical imitation learning (a high-level module predicting hand sub-goals plus a low-level goal-conditioned controller), demonstrating that grasp data can serve as a scalable pretraining resource for contact-rich dexterous manipulation. From CMU's David Held group.

Ying Yuan et al. (CMU) · arXiv 2606.30749 source

World-Action Model Achieves First Sim-to-Real Transfer from Pure Synthetic Priors · world-model

Replacing costly real-robot demonstrations with scalable synthetic data has long been appealing, but world-action models had never before been shown to bridge the sim-to-real gap. This paper is the first to train a world-action model purely on synthetic priors and deploy it zero-shot to real-robot manipulation, opening a path for synthetic-data-driven manipulation learning.

Zixing Wang et al. · arXiv 2606.31101 source

SARL: Fine-Tuning Generalist Robot Policies via Semantic Reinforcement Learning · vla

Generalist robot policies make good priors for downstream RL, but directly RL fine-tuning them often destroys their generalization. SARL instead optimizes a semantic prompt space through online interaction, using the generalist policy as a steerable "skill prior" — significantly outperforming existing methods at improving deployed behavior. From UC Berkeley's Sergey Levine group.

Jagdeep Singh Bhatia et al. (UC Berkeley) · arXiv 2606.31958 source

Z-1: Efficient Reinforcement Learning for VLA from Its Own Failures · vla

Most VLAs rely on behavior cloning or supervised fine-tuning, with few chances to improve from their own failures. Z-1 uses systematic GRPO post-training to substantially improve flow-based VLA policies without needing additional private demonstrations, surpassing published SOTA models.

Lang Cao et al. · arXiv 2606.31846 source

Human-as-Humanoid: Turning Human Video Directly into Motion Supervision for Humanoids · vla

Collecting real-robot data at scale for high-DOF humanoids is extremely difficult. This framework jointly aligns robot embodiment, sensor setup, and action-label interfaces, converting first- or third-person human video near real-time into observation-action supervision usable by humanoids — making "learning humanoid control from human video" zero-shot feasible.

Xiaopeng Lin et al. · arXiv 2606.32009 source

Freeform Preference Learning: Learning Manipulation Policies from Freeform Human Preferences · manipulation

In long-horizon manipulation, sparse success labels are too weak a signal, while binary preferences collapse multiple quality dimensions into one blurry signal. FPL instead lets robots learn from freeform human preferences, improving 38 percentage points over sparse-reward and binary-preference baselines, and can steer policies toward different behaviors at test time without retraining. From Stanford's Chelsea Finn group.

Marcel Torne et al. (Stanford) · arXiv 2606.32027 source

Other papers today: DVG-WM (an efficient embodied world model with decoupled video generation, achieving better image quality on LIBERO and real robots with up to 3.97x speedup, arXiv 2606.32028 source); CoDex (zero-demonstration dexterous functional manipulation, using a VLM to infer semantic constraints then refining with RL, arXiv 2606.31909 source); World-Model Collapse as a Phase Transition (the implicit world model in long-horizon language agents collapses abruptly at a critical point, with world fidelity failing before action validity, arXiv 2606.31399 source); Delta-JEPA (latent difference-action decoding improves world models' sensitivity to actions, arXiv 2606.31232 source); RL control for humanoid roller-skating (from Marco Hutter's group, energy use drops about 50% versus normal gait when equipped with consumer roller skates, arXiv 2606.31807 source); a survey on Robustness of Robotic Manipulation (a unified conceptual and mechanistic framework for manipulation robustness, arXiv 2606.31494 source); Revisiting Parameter Redundancy in VLA (parameter evolution patterns and efficient pruning in VLM→VLA adaptation, arXiv 2606.31382 source); ChronoFlow-Policy (a diffusion visuomotor policy unifying past-present-future interaction flow, arXiv 2606.31493 source); One Video One World / OVOW (training-free reconstruction of a single monocular video into a simulatable, instance-level 4D mesh scene, arXiv 2606.31388 source); ForgeDrive (bidirectional cross-conditioning between vision and action, unifying simulation, planning, and visual odometry for autonomous driving, arXiv 2606.31226 source).

Open Source · Tools · Benchmarks

· NVIDIA ASPIRE (GEAR lab): a self-improving skill discovery system that automatically diagnoses and repairs code-as-policy programs from execution feedback, then accumulates verified fixes into a reusable skill library, raising dual-arm handover success from 20% to 92%; released alongside a batch of open Nvidia models and simulation libraries source

· XiaomiOneVL: an autonomous driving model officially released and fully open-sourced by Xiaomi, claimed to exceed explicit chain-of-thought approaches in accuracy with inference latency as low as 0.24 seconds source

· FluxVLA Engine: an embodied intelligence foundation stack open-sourced jointly by Alibaba Cloud and AgiBot (Chinese embodied robotics startup, formerly Zhiyuan) source

· StarVLA: an open-source framework for building VLAs in a modular, "Lego-block" fashion, covering training on multiple benchmarks through to real-robot deployment on a Franka arm source (WeChat, CN)

· WorldRoamBench: an open-world benchmark for the long-horizon stability of interactive world models, evaluating action following, memory, and interaction physics across four dimensions (arXiv 2606.31672 source)

II. Funding and Deals

DISCOVER Robotics (Qiuzhi Technology) | Angel Round | Over $100 Million · embodied

Backed by Legend Capital, Xinchan Investment, Lenovo Capital, and others. The company was founded by Zhou Guyue, a former core member of DJI, and incubated by Tsinghua University's Institute for Intelligent Industry; it develops fully self-developed, consumer-grade embodied robots. This round sets a record for the largest single angel round in China's consumer embodied robotics space, and reflects the day's broader capital enthusiasm for "building the brain upstream, deploying the body downstream." Source: Robot Outlook source (WeChat, CN)

Sinovation Motong | Series C | Nearly ¥200 Million · industrial

Led by Kingsway Investment, with Jiuxuan Capital as financial advisor; focused on smart equipment and production-line integration. Source: Sohu source

Ganzhi Jiyuan | Angel Round | Tens of Millions of RMB · hardware

Led by Sino-Rock (Songhe) Capital, with funds mainly going toward pilot-scale production line construction and product iteration. The company makes multimodal tactile electronic skin, entering the humanoid "tactile sensing" sub-sector that has recently attracted concentrated investor interest. Source: Robot Outlook source (WeChat, CN)

Variable-Stiffness Joint Startup Spun Out of Beihang University | Angel Round | Nearly ¥100 Million · hardware

Founded by a team from the Robotics Institute at Beihang University (Beijing University of Aeronautics and Astronautics), focused on intelligent variable-stiffness joints, a core upstream component for humanoid robots. Source: 36Kr source

Zhitianxia | Angel Round | Amount Undisclosed · world-model

A world-model startup describing itself as aiming to build "China's World Labs," riding the recent wave of investor interest in world models (previously reported). Source: LeiPhone source

Xijing Technology | ChiNext IPO Application Accepted · autonomy

A provider of autonomous driving solutions for commercial vehicles; its prospectus claims its related revenue scale ranks first in the industry. A wave of robotics and autonomous driving companies in China are rushing to file for listings on Hong Kong and mainland A-share exchanges. Source: Sina Finance source

III. Commercialization and Deployment

Amazon Deploys 5,000 Robots at New Superwarehouse in Poland · industrial

Amazon's newly built giant warehouse in Poland has put roughly 5,000 robots into operation, another large-scale expansion of its warehouse automation in Europe. Source: TVP World source

JD.com Scales Up Both Warehousing and Food-Service Deployments · industrial

At JD.com's "Asia No. 1" smart logistics park in Harbin, 84 "Geek+" warehouse robots are now running in routine operation; at the same time, JD.com is accelerating the scaled rollout of embodied robots in food-service settings, expanding from warehouse to storefront scenarios it owns. Source: NetEase/Sina Finance source

RideFlux and Hanjin Launch Commercial Autonomous Freight in South Korea · autonomy

South Korean autonomous driving company RideFlux has partnered with logistics leader Hanjin to launch commercial autonomous freight services — a step for South Korean trunk-line logistics autonomy moving from testing to commercial operation. Source: thelec.net source

Neolix L4 Autonomous Delivery Vehicles Obtain Test Permit in Malaysia · autonomy ⚠️ Test permit

Neolix (Chinese autonomous delivery vehicle maker) has obtained an autonomous delivery test permit in Malaysia, taking its L4 logistics vehicles into Southeast Asia; this is a pilot-access permit rather than a scaled deployment. Source: cheshi.com source

AgiBot's 15,000th Unit Delivered to Longqi Factory (Previously Reported) · humanoid

The 15,000th embodied robot (Lingjing G2) that rolled off AgiBot's line on June 28 was delivered the same day directly to a Longqi Technology factory, entering front-line smart manufacturing use. The production milestone and deployment progress were detailed in earlier coverage, noted here briefly. Source: AGI Research Society source (WeChat, CN)

IV. Industry Developments

Japan Bets on "Physical AI" as a National Priority: ¥1 Trillion Over Five Years, 10 Million Robots by 2040 · world-model

Japan has formally established a national strategy for sovereign physical AI: the Ministry of Economy, Trade and Industry and its innovation agency have commissioned the Noetra alliance — led by SoftBank and Sony — along with the national research institute AIST, to develop a homegrown "physical AI" foundation model over fiscal years 2026–2030, with up to ¥1 trillion (~$6.1 billion) in funding over five years. The strategy's goal is to deploy 10 million robots across 18 industries including food service, food manufacturing, and healthcare by 2040, easing the effects of an aging population and labor shortages; Minister of Economy, Trade and Industry Ryosei Akazawa said the ministry would "go all-in on driving social implementation." This makes Japan the latest major economy, following China and South Korea, to elevate embodied/physical AI to the level of a national foundation-model priority. Source: Khaleej Times source

China's MIIT and SASAC Launch 2026 Special Action on "Real-World Field Training" for Humanoid Robots · humanoid

China's Ministry of Industry and Information Technology and the State-owned Assets Supervision and Administration Commission jointly issued a notice launching a 2026 special action on real-world field training for humanoid robots and embodied intelligence, specifying that by the end of 2026, core humanoid robot products should move into routine, real-world operational use across multiple typical scenarios. Combined with the industry standard YD/T 6770-2026 ("Benchmarking Methods for Embodied Intelligence"), which took effect June 1, China's embodied intelligence sector is entering a policy phase of "standards to follow, field training to execute." Source: Daji Shi source (WeChat, CN)

AVATR Obtains China L3 Autonomous Driving Test License · autonomy

AVATR (Chinese automaker) has obtained a China L3-level autonomous driving test license, launching multi-scenario road tests on highways and expressways in Chongqing. Following the UN's release of a global technical regulation on automated driving and China's push to submit mandatory national standards for L3/L4, China's L3 autonomous driving governance framework is progressively coming together, marking another step toward mass-production deployment. Source: The Beijing News (via DoNews) source

South Korea Releases Physical AI R&D Blueprint, Boosts National Fund · world-model

Following South Korea's elevation of physical AI to a national strategy (previously reported), the Ministry of Science and ICT has released an R&D blueprint proposing to overcome physical AI's data bottleneck through domestically developed world models. The national growth fund is simultaneously increasing related investment, and the government is accelerating the establishment of a safety certification system for humanoid robots. Source: The Korea Herald source

MEIL and Analog Plan $500 Million Push to Bring Physical Intelligence to India · adjacent ⚠️ Plan stage

Indian infrastructure conglomerate MEIL and UAE-backed Analog plan to invest approximately $500 million to deploy next-generation physical intelligence and AI infrastructure in India.Source: Business Standard source

World Model Momentum Continues, Academia Moves to "Set the Record Straight" · world-model

World models remain the day's dominant theme (previously reported): the 2026 Beijing Academy of Artificial Intelligence Conference sought to clarify the muddled definition of "world models," Fei-Fei Li wrote that world models are the gateway to physical intelligence, and both capital and academia continue to build momentum around the idea of a "robot brain." Source: Shake Network Technology News source (WeChat, CN)

Nvidia Expands Robotics Teams Across Three Locations in China · adjacent

Nvidia is expanding its robotics teams in three locations in China, positioning itself as providing "not a complete robot, but an Android-like operating system for robots" — i.e., supplying the platform and foundation software rather than the hardware itself, reinforcing its foothold in the embodied AI ecosystem's lower layers. Source: Sina Finance source

Hardware · Supply Chain

· Dexterous hands: Digitimes reports that price competition in China has cut the cost of humanoid robot hands by roughly half, though precision components such as six-axis force sensors remain difficult to bring down in cost at the same pace; GGII data projects China's dexterous hand sales will reach 70,000 units in 2026, about 3.6 times the 2025 figure source

· Joyson Electronics (Chinese Tier-1 auto parts supplier): to launch robotic dexterous hands and semi-solid-state battery solutions, as Tier-1 auto parts makers move in bulk into the embodied robotics supply chain source

· LiDAR: rising physical AI demand drove Ouster and Aeva shares up about 14% and 11% respectively in a single day source