Shawn

Posted on Jul 4

FutureX · Physical AI Daily — Issue 48 (07/05)

#ai #robotics #machinelearning #research

Today's Highlights

· The China Humanoid Robot 100 Council and the China Machinery Industry Federation jointly issued an initiative to regulate the development and marketing of "emotional companion" humanoid robots, urging strict ethical and privacy safeguards — widely read in the market as aimed squarely at models like UBTech's U1, a bionic companion robot priced near RMB 1 million.

· Tesla brought unsupervised Robotaxi service to Texas for the first time, launching in Miami (Model Y), expanding its operating footprint to 5 cities as Q2 deliveries rose against the trend by roughly 25%.

· Fei-Fei Li and NVIDIA GEAR's SimFoundry: a single real-world video is enough to automatically generate an interactive simulation plus "digital cousins," boosting real-robot task success rates by up to 40% and reaching a 0.911 correlation between sim evaluation and real-robot performance.

· LeCun's team's AdaJEPA lets world models "learn on the job" via test-time adaptation, nearly doubling out-of-distribution planning success rates while adding only 0.01–0.03 seconds of extra latency.

· Guangxiang Technology (Tsinghua-incubated startup), spun out of Tsinghua, closed an angel round of hundreds of millions of yuan, championing a "physically-native foundation model" approach; its industrial robots have already run 21.5 hours straight on an auto production line with zero errors.

I. Research Progress

SimFoundry: Automatically Generating Trainable, Evaluable Simulated Worlds from a Single Real Video · benchmark

Collecting real-robot data is expensive and hard to scale, and Real2Sim approaches try to build simulations backward from real video — but past solutions often solved only one piece of the puzzle, either reconstructing 3D scenes or enabling policy evaluation alone. SimFoundry chains together scene reconstruction, data generation, policy evaluation, and policy training into a complete real-to-sim-to-real pipeline: an ordinary RGB video automatically generates an interactive "digital twin," from which "digital cousins" (with swapped objects, altered layouts, or new tasks) are batch-derived while preserving object function and affordances, yielding near-limitless training data. After introducing three categories of digital cousins, real-robot task success rates rose by 17%, 21%, and 40% respectively, sim-to-real Pearson correlation reached 0.911, and policies trained solely on this generated data could deploy zero-shot to real robots.

Nadun Ranawaka Arachchige et al. (NVIDIA GEAR / Georgia Tech / Stanford / UT Austin / University of Toronto, including Fei-Fei Li, Jim Fan, and Yuke Zhu) · arXiv 2606.28276 source · Coverage: QbitAI source (WeChat, CN)

AdaJEPA: Letting World Models Keep Learning After Deployment · world-model

Past JEPA-family latent-space world models have generally frozen their parameters after training, which makes them prone to failure when the test-time distribution shifts, while short-horizon MPC rollouts can further amplify single-step errors. NYU and LeCun's startup AMI propose AdaJEPA, bringing test-time adaptation (TTA) into world models: after executing only the first segment of an MPC-planned action, the next real observed frame is used as a self-supervised signal to lightly update the final layers of the encoder and predictor (just one gradient step per timestep). On out-of-distribution tasks like PushObj and PointMaze, planning success rates nearly doubled (PointMaze GD rose from 53.3% to 78.7%), while added latency was only 0.01–0.03 seconds — more like giving a frozen model a "self-calibration at deployment" mechanism.

Ying Wang et al. (NYU CILVR Lab / AMI, advised by Mengye Ren and Yann LeCun) · arXiv 2606.32026 source · Coverage: QbitAI source (WeChat, CN)

EgoTSR: Teaching VLMs to Judge Whether a Task Is Actually Progressing · perception

An arm fumbles a grasp and the cup falls back onto the table — the later frame in the sequence, yet the task has actually reverted to its starting point — and VLMs often get this backwards, relying on the shortcut that "a later-appearing frame looks more complete." EgoTSR, from a five-university team including Zhejiang University, targets exactly this "temporal order bias": feeding the model both the forward and reversed order of the same image pair exposes the shortcut (InternVL-8B scores nearly 99% accuracy forward but collapses to roughly 2% in reverse). The team built a 46-million-sample dataset and trained with a three-stage "explain first, then internalize, then plan" curriculum, adding a subtask planner for long-horizon progress reasoning; the result is 92.4% average accuracy on long-horizon tasks with only a 0.1-percentage-point gap between forward and reverse order, and it can output a continuous task-completion curve for long-video monitoring.

Zhejiang University / Tianjin University / Shanghai Jiao Tong University / National University of Singapore et al. (ICML 2026) · arXiv 2604.10517 source · Code: Collab-Gen/EgoTSR · Coverage: Jiqizhixin (Synced) source (WeChat, CN)

GEM: Giving VLA "Depth Vision" to Fill the Spatial Gap · vla

Most current VLAs are "semantic giants, spatial dwarfs" — they can recognize a "red cup" but can't accurately judge how far away it is. Tencent's team behind GEM brings depth-map generation into the VLM pretraining stage: a diffusion-based depth generation head is attached to the side of a Qwen3-VL backbone, conditioned on visual tokens, forcing the visual representation to encode 3D structure. On the VSI-Bench spatial understanding benchmark, scores rose from 57.9 to 70.6 (+12.7%), surpassing Gemini-3-Pro, with an average of 96.1% across four LIBERO suites; on real-world UR5 long-horizon tasks (Table Bussing), average progress improved 67% over π₀.₅, and removing depth supervision caused a significant performance drop, confirming it functions as core infrastructure rather than an add-on.

Tencent · arXiv 2605.28548 source · Coverage: Embodied AI Manufacturing source (WeChat, CN)

Drop-Then-Recovery: Cutting Half the Language Layers Makes VLA Stronger · vla

Is a multi-billion-parameter "language brain" really necessary for VLA? University of Maryland and Cisco Research's Drop-Then-Recovery offers a counterintuitive answer: physically removing half the language backbone and then fine-tuning to recover actually raised success rates rather than lowering them — OpenVLA-OFT on LIBERO went from 95.0% to 98.3%, and π0.5 went from 91.7% to 94.0%; meanwhile the vision and action pathways collapsed after removing even a tiny fraction of parameters, revealing a clear asymmetry of "redundant language, untouchable action." The authors propose a GateProbe virtual-gate metric to predict "which layers can be safely removed and recovered," and the finding also serves as a warning that current manipulation benchmarks' tests of language grounding may be too weak.

University of Maryland / Cisco Research · Coverage: Paper Digest Hall source (WeChat, CN)

OmniContact: Linking Up Long-Horizon Humanoid Manipulation with "Contact Flow" · locomotion

The hard part of long-horizon humanoid loco-manipulation is often the "seams between actions": if a box shifts, a suitcase gets stuck, or the previous segment wasn't executed precisely, can the next segment still pick up where it left off? Noitom Robotics and HKUST et al. propose OmniContact, which uses "contact flow" (who touches what, when, and how the body moves before and after contact) as a sparse intermediate interface: a high-level CF-Gen generates the contact flow, while a low-level CF-Track uses reinforcement learning to track it into full-body motion. With online replanning added, box-carrying correction success reached 99.7%, box-pushing rose to 94.5%, and the system can connect to a VLM to break down semantic tasks (such as arranging scattered boxes into a heart shape) into object-level goals.

Noitom Robotics / Hong Kong University of Science and Technology / Wuhan University / University of Hong Kong · Project page: omnicontact.github.io · Coverage: Embodied AI Research Lab source (WeChat, CN)

Other papers today: world models remain the day's most active research direction — WM-AMT injects "predict-future-states" capability before post-training, letting agents perform what-if reasoning first (claiming roughly +9.8% reasoning accuracy); LoopWM lets a world model "reconsider" the same step repeatedly before deciding; and there's also an efficient reinforcement learning framework that connects a world model to a real robot to mitigate "dream-like" visual hallucinations and contact-dynamics artifacts.

Open Source · Tools · Benchmarks

· Xiaomi open-sources its latest autonomous driving model: led by its core technical team, focused on handling complex driving scenarios with a small model source

II. Funding & Deals

Guangxiang Technology (Tsinghua-incubated startup) ｜ Angel round (cumulative) ｜ Hundreds of millions of yuan · world-model

This round included financial and industrial investors such as Zhuhai Science & Technology Industry Group, Xingzheng Capital, Songhe Capital, Shunxi Fund, and SeeFund, along with listed company Xingyun Technology, plus additional investment from existing shareholders Nova Capital and L2F Guangyuan Fund. Guangxiang Technology was founded in April 2025, incubated jointly by Tsinghua University's School of Vehicle and Mobility and School of Artificial Intelligence. Its CEO, Zhang Tao, is the former head of Amap's spatial perception engine, and co-founder Li Shengbo is an expert in autonomous driving reinforcement learning. Technically, the company has chosen a "physically-native foundation model" path distinct from mainstream VLA and video-prediction world models — letting the model spontaneously develop an understanding of mass, friction, and causality through physical interaction, supported by its Phi-RL Matrix algorithm, Phi-Space data assets, and Phi-Arch platform. Its industrial robot, Phi-Bot X1, has already completed 21.5 continuous hours of welding pickup and placement on an auto production line with zero errors, improved mobile quality-inspection efficiency by 25–45% over manual work, and formed partnerships with several leading automakers, targeting the "30% digitalization gap" that robotic arms and PLCs can't handle.

Source: Hard Krypton source (WeChat, CN), 36Kr source

Huiguang Innovation ｜ Seed + angel round ｜ Tens of millions of yuan · hardware

Robot tactile sensor and tactile-data solution provider Huiguang Innovation (a team of Tsinghua graduates born after 2000) has closed two consecutive funding rounds, led by Vertex Ventures and a dual-currency financial investor, with follow-on participation from Poqoo Robotics, Infinity Fund SEE Fund, and the Shuimu Tsinghua Alumni Seed Fund. Tactile sensing is a key bottleneck for dexterous manipulation and force control, and this round continues capital's bet on "embodied perception layer" components. Source: The Beauty of Algorithms and Mathematics source (WeChat, CN)

Ouster ｜ Follow-on offering ｜ Roughly $200 million · hardware

Lidar maker Ouster raised roughly $200 million to bolster cash reserves and scale up production, adding fresh capacity to the robotics/autonomous-driving perception supply chain. Source: Pluang source

RoboParty ｜ Pre-A round ｜ Amount undisclosed · embodied

RoboParty closed a Pre-A funding round, with investment from battery giant CATL — another signal of industrial capital extending along the "battery-to-robotics" chain into embodied AI. Source: Sina Finance source

Overland AI ｜ Military contract ｜ Nearly $20 million · autonomy

Autonomous driving company Overland AI won a roughly $19.7 million contract from the U.S. Marine Corps to build unmanned autonomous military vehicles — a defense-sector deal offering a practical monetization path for off-road autonomous driving. Source: GeekWire source

Xiaoyu Zhizao (Xiaomi-affiliated) ｜ Series B+ ｜ Hundreds of millions of yuan · industrial ⚠️ Single-source claim

Industrial embodied AI platform Xiaoyu Zhizao reportedly closed a Series B+ round of hundreds of millions of yuan, pursuing a "platform" approach in an attempt to sidestep the crowded competition in full humanoid systems. Details on the exact amount and round terms remain unconfirmed by authoritative sources and are based on the company's own account. Source: Selected Business Plans source (WeChat, CN)

III. Commercialization & Deployment

Tesla's Unsupervised Robotaxi Arrives in Miami, Operating Footprint Grows to 5 Cities · autonomy

Tesla brought Robotaxi to Texas for the first time, launching in Miami using the Model Y and describing the service as "fully unsupervised." Its operating area has now expanded to roughly 5 cities: Miami, Dallas, Houston (unsupervised), and Austin (hybrid mode, with safety monitors in some vehicles), putting it in direct competition with Waymo. The Miami launch coincided with Tesla's disclosure that Q2 deliveries rose against the trend by roughly 25%, giving a short-term boost to market confidence in its autonomous-driving narrative. That said, "unsupervised" operation is still bound by geofencing (a restricted operating area) and remote support, and the scale and real-world safety performance still require longer-term data to verify. Source: blockchain.news source, Refresh Miami source

Waymo Opens Fully Driverless Rides in Nashville · autonomy

Waymo officially opened fully autonomous ride-hailing service in Nashville, continuing the steady expansion of its Robotaxi network across U.S. cities. Source: Mashable source

XPeng's First GX Robotaxi Production Unit Rolls Off the Line · autonomy

XPeng's GX Robotaxi has produced its first mass-production unit, marking a milestone from prototype to production; separately, XPeng said its VLA 2.0 will achieve "map-free" autonomous driving for international markets by 2027. Source: autohome.com.cn source

GM Cuts 1,000 Jobs at Detroit EV Plant, Deploys 50 AI Collaborative Robots · industrial

General Motors cut roughly 1,000 jobs at its Detroit EV hub while deploying 50 AI collaborative robots, drawing strong pushback from the union. This is a stark example of the "machines replacing workers" tension arising as embodied automation advances, and a reminder that deployment pace is being pulled back by employment and social pressures. Source: finance.biggo.com source, Futurism source

Household Embodied Robots Start Entering Homes via "Pay-Per-Use" Rental · humanoid

Household embodied robots are exploring the consumer market through "pay-per-use" rental, with one service offering chore trials for around RMB 74 per 3 hours. Zhiyuan Robotics (Chinese humanoid startup) has also registered the trademark "RoboShare" for its open rental platform "Qingtian Zu" ("Rent-a-Sky"), attempting to turn humanoid robots into a rentable service. The real scale of usage, reliability, and repeat-rental rates for this model remain to be verified. Source: Pandaily source, Trademark Supermarket source (WeChat, CN)

IV. Industry Developments

Two Major Industry Bodies Jointly Issue Guidelines on "Emotional Companion" Humanoid Robots · humanoid

The China Humanoid Robot 100 Council and the China Machinery Industry Federation jointly issued an initiative stating that the industry should preserve humans' prerogative to set preferences for service, follow safety and ethical standards in product design and advertising, strengthen personal information protection, and reinforce quality management to guard against privacy breaches and physical harm to users, while also calling for breakthroughs in core technologies and real-world application. This marks the first time an authoritative Chinese industry body has set explicit ethical and safety boundaries for "companion/emotional" humanoid robots, timed just as UBTech's U1 (top configuration priced at RMB 990,000 for the male version and RMB 880,000 for the female version, marketed for long-term emotional companionship) saw pre-orders exceed 13,000 units, with its appearance and positioning continuing to draw controversy. It's worth noting that this is industry-level guidance and self-regulation, not a regulatory ban; some secondary-market commentary framing it as "invalidating the commercialization logic" overreaches, but it does draw compliance boundaries around this most imaginative — and most sensitive — segment of the industry. Source: China News Service source

UN Approves Global Autonomous Driving Regulations; XPeng Plans Map-Free Overseas Rollout by 2027 · autonomy

The United Nations has approved global regulations related to autonomous driving, paving the way for cross-border deployment of advanced autonomous driving. XPeng used the occasion to state that its VLA 2.0 will achieve "map-free" autonomous driving for overseas/international markets by 2027; CEO He Xiaopeng separately said L4-to-L5 autonomy could arrive within 3 to 5 years. The regulatory framework progress is factual, but the specific timing and capability of mass production and overseas rollout remain forward-looking plans from the manufacturer. Source: netscn.com (Wangtongshe) source

Unitree IPO Aftermath Continues; Founder Chen Li Discusses the "ChatGPT Moment for Embodied AI" · humanoid ⚠️ Company statement

Following the approval of its IPO registration on the STAR Market (previously reported), the A-share robotics sector rally driven by Unitree continued today, and Unitree Robotics was selected for an on-site regulatory inspection. Founder Chen Li publicly stated that embodied AI's "ChatGPT moment" requires reaching "two 80% thresholds," and predicted the company will lead a new wave of consumer products by 2030 — a stated vision, not an established fact. Source: cls.cn (China Securities Journal/CAIXIN) source, Sina Finance source

12 New Occupations, Including "Embodied AI Robot Application Technician," Announced for Public Comment · adjacent

12 new occupations were announced for public comment, including "Embodied AI Robot Application Technician" — a recognition at the occupational-classification level that reflects the industry's shift from technology validation toward large-scale employment and standardized job roles. Source: Sina Finance source

Shanghai International Embodied AI Expo Closes with Over 120 Partnership Agreements · adjacent

China's first independent professional exhibition dedicated to "embodied AI," CIEI 2026, has closed, with nearly 200 companies — including Unitree, Zhiyuan Robotics, UBTech, LimX Dynamics, Galbot, and Fourier Intelligence — exhibiting everything from full robotic systems to core components, and reaching more than 120 partnership agreements on-site. Source: MiaoZhuo Talks AI source (WeChat, CN)

Hardware · Supply Chain

· Dexterous Hand Multimodal Dedicated Chip: The Zhongke group (Zhongke Alpha / Zhongke Semiconductor) has launched a multimodal sensing and control chip dedicated to next-generation dexterous hands, presenting three core chip solutions using a 22-DOF dexterous hand as an example, emphasizing high precision, low latency, and small form factor source (WeChat, CN)

· Yuequan Bionics Y-Hand M1: This new dexterous hand returns to "human hand biology," balancing rigidity and compliance to combine grip strength with gentle, damage-free handling of fragile or irregularly shaped objects source (WeChat, CN)

· Li Auto Mach M100 Chip: Li Auto's self-developed autonomous-driving chip, the Mach M100, has debuted, marking further progress in in-house vehicle compute development source

· Harmonic Reducer Supply Gap: A brokerage research report claims humanoid robot harmonic reducers face a roughly 20% supply gap, forming the core thesis behind upstream capacity-flexibility investment plays (⚠️ analyst report claim, not measured data) source (WeChat, CN)