Humanoid Robots Learn to Manipulate Objects While Walking

#research #machinelearning

New training method enables continuous dexterous manipulation during locomotion, moving beyond stop-and-grasp limitations.

Researchers have developed a breakthrough training approach that allows humanoid robots to perform complex hand manipulation tasks without pausing their movement, addressing a fundamental limitation in robotic locomotion and grasping. According to arXiv, the method, called CoorDex, enables robots equipped with high-degree-of-freedom hands to execute intricate manipulation while maintaining continuous motion.

Traditional humanoid robotics relies on a sequential workflow: the robot walks to a target location, stops completely, performs manipulation, then resumes walking. This disjointed approach mirrors how humans would move if they operated under strict cognitive constraints, yet humans naturally coordinate arm and hand movements while walking or running. CoorDex attempts to bridge this gap by training robots to integrate locomotion and fine-grained finger control into a single coordinated behavior.

Addressing the Coordination Challenge

The system works by converting high-dimensional control signals from both the robot body and its dexterous hand into a unified latent action space. Rather than treating locomotion and manipulation as separate tasks with independent control systems, the researchers trained privileged motion tracking teachers for each component, then distilled these into shared representations conditioned on the robot's position and velocity sensing.

The coordinated residual policy architecture uses a shared task context layer combined with separate refinement heads for body and hand movements. This design preserves natural whole-body motion patterns while improving the reliability of finger-to-object contact during dynamic movement.

Real-World Validation

Researchers validated the approach on a Unitree G1 humanoid equipped with a 20-degree-of-freedom WUJI hand. Demonstrated tasks included:

Grasping and carrying a bottle without interrupting locomotion
Opening a refrigerator door while walking
Picking up and rotating a cube in motion

Ablation studies revealed that conventional approaches failed under identical computational budgets. Joint-space reinforcement learning, separate hand control systems, and monolithic latent predictions all proved inadequate for continuous manipulation tasks. The latent-prior interface and coordinated residual structure emerged as essential components for making contact-rich loco-manipulation trainable.

Implications for Robotics

This work represents a meaningful advancement in embodied AI, where robots must coordinate multiple subsystems to perform complex tasks. Rather than achieving dexterity or mobility in isolation, the research demonstrates that careful architectural choices can enable both simultaneously.

The approach has implications beyond simple manipulation demonstrations. As humanoid robots move toward practical deployment in unstructured environments, the ability to perform tasks without coming to a complete stop could significantly improve efficiency in warehousing, manufacturing, and service applications.

The researchers have released project documentation and implementation details, enabling the broader robotics community to build upon this foundation. Future work may explore how similar coordination principles apply to multi-arm systems or how the approach scales to more complex manipulation sequences involving tool use or multi-object interactions.

This article was originally published on AI Glimpse.