HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

HumanoidReinforcement LearningWhole-body Control

Humanoid robots, capable of assuming human roles in various workplaces, have become essential to the advancement of embodied intelligence. However, as robots with complex physical structures, learning a control model that can operate robustly across diverse environments remains inherently challenging, particularly under the discrepancies between training and deployment environments. In this study, we propose HWC-Loco, a robust whole-body control algorithm tailored for humanoid locomotion tasks. By reformulating policy learning as a robust optimization problem, HWC-Loco explicitly learns to recover from safety-critical scenarios. While prioritizing safety guarantees, overly conservative behavior can compromise the robot's ability to complete the given tasks. To tackle this challenge, HWC-Loco leverages a hierarchical policy for robust control. This policy can dynamically resolve the trade-off between goal-tracking and safety recovery, guided by human behavior norms and dynamic constraints. To evaluate the performance of HWC-Loco, we conduct extensive comparisons against state-of-the-art humanoid control models, demonstrating HWC-Loco's superior performance across diverse terrains, robot structures, and locomotion tasks under both simulated and real-world environments.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes HWC-Loco, a hierarchical whole-body control algorithm for robust humanoid locomotion that frames policy learning as a robust optimization problem. It resides in the Hierarchical Policy Architectures leaf, which contains five papers total (including this one). This leaf sits within the broader Hybrid and Hierarchical Learning-Based Control branch, indicating a moderately populated research direction focused on multi-level policy structures. The taxonomy shows that hierarchical approaches represent one of several active strategies for humanoid control, alongside end-to-end RL, model-based methods, and specialized task controllers.

The Hierarchical Policy Architectures leaf is adjacent to Hybrid RL with Model-Based Components (three papers) and sits under the same parent as End-to-End Reinforcement Learning for Locomotion (nine papers across three sub-leaves). The taxonomy structure reveals that learning-based approaches dominate the field, with model-based methods forming a parallel major branch. HWC-Loco's emphasis on robust optimization and safety-critical recovery connects it conceptually to the Robustness and Disturbance Rejection branch (two papers), though its core methodology remains hierarchical learning. The scope notes clarify that purely end-to-end methods without hierarchical decomposition belong elsewhere, positioning this work at the intersection of structured control and learned policies.

Among thirty candidates examined, the hierarchical policy architecture contribution shows two refutable candidates, while the robust optimization formulation and overall HWC-Loco framework each show zero refutations across ten candidates examined per contribution. The presence of two overlapping works for the hierarchical switching mechanism suggests that dynamic coordination between high-level planning and low-level execution has received prior attention, though the specific integration with robust optimization may differentiate this approach. The robust optimization formulation appears less directly addressed in the examined literature, indicating potential novelty in how safety-critical scenarios are explicitly incorporated into the learning objective. The limited search scope (thirty candidates) means these findings reflect top semantic matches rather than exhaustive coverage.

Based on the top-thirty semantic search results, HWC-Loco appears to occupy a moderately explored area within hierarchical humanoid control, with its robust optimization framing potentially offering a distinguishing angle. The analysis does not cover the full breadth of humanoid locomotion research (fifty papers in taxonomy, thirty examined here), and the refutation statistics reflect only the most semantically similar prior work. The contribution-level breakdown suggests that while hierarchical architectures are established, the specific combination of safety-aware robust optimization with dynamic policy switching may represent an incremental advance over existing hierarchical methods.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: robust humanoid locomotion control. The field organizes itself around several complementary perspectives. Learning-Based Control Approaches encompass end-to-end reinforcement learning methods, hybrid architectures that blend learned and classical components, and hierarchical policy designs that decompose locomotion into multiple decision layers. Model-Based and Optimization-Driven Control focuses on trajectory optimization, model-predictive control, and reduced-order modeling to exploit known dynamics. Specialized Locomotion Tasks and Capabilities address particular challenges such as loco-manipulation, terrain adaptation, and dynamic maneuvers, while Robustness and Disturbance Rejection emphasizes stability under external perturbations. Gait Planning and Phase Control and Sensor-Based Reactive Control provide complementary views on rhythm generation and real-time feedback, and the taxonomy also includes branches on Surveys and Comparative Studies as well as Mechanical Design and Hardware Platforms. Representative works like Real-world Humanoid RL[1] and Lipschitz Humanoid[2] illustrate the diversity of learning strategies, whereas Versatile Bipedal RL[3] and World Model Locomotion[4] highlight different ways to achieve generalization and sample efficiency. Within the learning-based branch, hierarchical policy architectures have attracted considerable attention for their ability to manage complexity by separating high-level planning from low-level tracking. HWC-Loco[0] sits squarely in this cluster, proposing a hierarchical design that coordinates whole-body control with learned locomotion policies. Nearby works such as Gentle Humanoid[5] and Hold My Beer[10] also adopt hierarchical or modular structures but differ in their emphasis on contact-aware reasoning or task-specific constraints. Adversarial Motion Imitation[27] and Hierarchical Bipedal Control[28] further illustrate the range of hierarchical strategies, from imitation-driven skill acquisition to explicit decomposition of gait phases. A central trade-off across these methods is the balance between end-to-end learning flexibility and the interpretability or safety guarantees offered by structured hierarchies. Open questions remain about how best to transfer such policies to real hardware under model mismatch and how to scale hierarchical designs to more complex terrains and manipulation tasks.

Claimed Contributions

HWC-Loco: A hierarchical whole-body control algorithm for robust humanoid locomotion

10 retrieved papers

The authors introduce HWC-Loco, a hierarchical policy framework that dynamically switches between a goal-tracking policy for task execution and a safety-recovery policy for stability under disturbances. This approach addresses the trade-off between task performance and safety guarantees in humanoid locomotion.

10 retrieved papers

Robust optimization formulation for humanoid locomotion under mismatched dynamics

10 retrieved papers

The authors formulate the humanoid locomotion problem as a robust constrained reinforcement learning objective that ensures worst-case feasibility constraints across mismatched training and deployment environments while maximizing task rewards in the learning environment.

10 retrieved papers

Hierarchical policy architecture with dynamic switching mechanism

Can Refute

10 retrieved papers

The authors design a two-level policy structure where a high-level planner coordinates between a goal-tracking policy (for efficient task execution with human-like motion) and a safety-recovery policy (for handling safety-critical events), guided by human behavior norms and dynamic constraints.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control PDF

Zhang Yuanhang, Yitang Li, Xiao Wenli, Yuanhang Zhang, Pan Chao-yi, Wenli Xiao, Weng Haoyang, Chaoyi Pan, He, Guanqi, Haoyang Weng, Tairan, Guanqi He, Shi, Guanya, Tairan He, Guanya Shi (2025)

[10] Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control PDF

Zhang Yuanhang, Yitang Li, Xiao Wenli, Yuanhang Zhang, Pan Chao-yi, Wenli Xiao, Weng Haoyang, Chaoyi Pan, He, Guanqi, Haoyang Weng, Tairan, Guanqi He, Shi, Guanya, Tairan He, Guanya Shi (2025)

[27] Adversarial locomotion and motion imitation for humanoid policy learning PDF

Shi Ji-yuan, Liu Xin-zhe, Wang Dewei, Lu Ouyang, Schwertfeger, SÃ¶ren, Zhang Chi, Sun Fu-chun, Bai, Chenjia, Li, Xuelong (2025)

[28] Robust bipedal locomotion based on a hierarchical control structure PDF

Jianwen Luo, Yao Su, Lecheng Ruan, Ye Zhao, Donghyun Kim, Luis Sentis, Chenglong Fu, L. Sentis, Dong-Hyun Kim (2019)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

HWC-Loco: A hierarchical whole-body control algorithm for robust humanoid locomotion

[70] High-Speed and Enhanced Motion Control for a Wheeled-Legged Humanoid Robot Using a Two-Wheeled Inverted Pendulum With Roll Joint PDF

Cannot Refute

[71] Real-time Whole-body Model Predictive Control for Bipedal Locomotion With Novel Kino-dynamic Model and Warm-start Method PDF

Cannot Refute

[72] A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs PDF

Cannot Refute

[73] Enhanced Robust Locomotion of Wheeled-Bipedal Robot via Hierarchical Optimization and Online Wheel Position Planning PDF

Cannot Refute

[74] SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control PDF

Cannot Refute

[75] Implementation of a robust dynamic walking controller on a miniature bipedal robot with proprioceptive actuation PDF

Cannot Refute

[76] Template Model Inspired Task Space Learning for Robust Bipedal Locomotion PDF

Cannot Refute

[77] Partition-Aware Stability Control for Humanoid Robot Push Recovery With Whole-Body Capturability PDF

Cannot Refute

[78] Whole-Body Trajectory Generation and Control Strategies for Multi-Contact Robots PDF

Cannot Refute

[79] Force-feedback based Whole-body Stabilizer for Position-Controlled Humanoid Robots PDF

Cannot Refute

Contribution

Robust optimization formulation for humanoid locomotion under mismatched dynamics

[18] Robust-locomotion-by-logic: Perturbation-resilient bipedal locomotion via signal temporal logic guided model predictive control PDF

Cannot Refute

[61] Robust Safety-Critical Control for Dynamic Robotics PDF

Cannot Refute

[62] Rethinking robustness assessment: Adversarial attacks on learning-based quadrupedal locomotion controllers PDF

Cannot Refute

[63] Data-Driven Adaptation for Robust Bipedal Locomotion with Step-to-Step Dynamics PDF

Cannot Refute

[64] Optimal Robust Safety-Critical Control for Dynamic Robotics PDF

Cannot Refute

[65] A Safety-Critical Framework for UGVs in Complex Environments: A Data-Driven Discrepancy-Aware Approach PDF

Cannot Refute

[66] Differential Dynamic Programming With Nonlinear Safety Constraints Under System Uncertainties PDF

Cannot Refute

[67] Robust tracking with model mismatch for fast and safe planning: an sos optimization approach PDF

Cannot Refute

[68] Optimal robust control for constrained nonlinear hybrid systems with application to bipedal locomotion PDF

Cannot Refute

[69] Residual Policy Optimization with Trust Region Constraints: A Learning Framework for Stable and Agile Wheel-Legged Locomotion PDF

Cannot Refute

Contribution

Hierarchical policy architecture with dynamic switching mechanism

[55] Hierarchical Reinforcement Learning-Based Policy Switching Towards Multi-Scenarios Autonomous Driving PDF

Can Refute

[57] Hierarchical Control for Robust Standing Stability and Fall Recovery of Task-Performing Humanoid Robots PDF

Can Refute

[51] Multi-Agent Fuzzy Reinforcement Learning With LLM for Cooperative Navigation of Endovascular Robotics PDF

Cannot Refute

[52] Hierarchical control framework for path planning of mobile robots in dynamic environments through global guidance and reinforcement learning PDF

Cannot Refute

[53] Efficient Adaptive Multi-Level Privilege Partitioning With RTrustSoC PDF

Cannot Refute

[54] Adaptive Security Policy Modeling for Intelligent Financial Systems PDF

Cannot Refute

[56] Fuzzy multi-level security: An experiment on quantified risk-adaptive access control PDF

Cannot Refute

[58] Hierarchical approach to adaptive control for improved flight safety PDF

Cannot Refute

[59] LLM-Based Generalizable Hierarchical Task Planning and Execution for Heterogeneous Robot Teams with Event-Driven Replanning PDF

Cannot Refute

[60] Understanding the Process and Mechanisms of Sustainable Mobility Transitions in Dutch citiesâAn analysis using Multi-level Perspective Framework PDF

Cannot Refute

HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control PDF

[10] Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control PDF

[27] Adversarial locomotion and motion imitation for humanoid policy learning PDF

[28] Robust bipedal locomotion based on a hierarchical control structure PDF

Contribution Analysis

HWC-Loco: A hierarchical whole-body control algorithm for robust humanoid locomotion

[70] High-Speed and Enhanced Motion Control for a Wheeled-Legged Humanoid Robot Using a Two-Wheeled Inverted Pendulum With Roll Joint PDF

[71] Real-time Whole-body Model Predictive Control for Bipedal Locomotion With Novel Kino-dynamic Model and Warm-start Method PDF

[72] A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs PDF

[73] Enhanced Robust Locomotion of Wheeled-Bipedal Robot via Hierarchical Optimization and Online Wheel Position Planning PDF

[74] SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control PDF

[75] Implementation of a robust dynamic walking controller on a miniature bipedal robot with proprioceptive actuation PDF

[76] Template Model Inspired Task Space Learning for Robust Bipedal Locomotion PDF

[77] Partition-Aware Stability Control for Humanoid Robot Push Recovery With Whole-Body Capturability PDF

[78] Whole-Body Trajectory Generation and Control Strategies for Multi-Contact Robots PDF

[79] Force-feedback based Whole-body Stabilizer for Position-Controlled Humanoid Robots PDF

Robust optimization formulation for humanoid locomotion under mismatched dynamics

[18] Robust-locomotion-by-logic: Perturbation-resilient bipedal locomotion via signal temporal logic guided model predictive control PDF

[61] Robust Safety-Critical Control for Dynamic Robotics PDF

[62] Rethinking robustness assessment: Adversarial attacks on learning-based quadrupedal locomotion controllers PDF

[63] Data-Driven Adaptation for Robust Bipedal Locomotion with Step-to-Step Dynamics PDF

[64] Optimal Robust Safety-Critical Control for Dynamic Robotics PDF

[65] A Safety-Critical Framework for UGVs in Complex Environments: A Data-Driven Discrepancy-Aware Approach PDF

[66] Differential Dynamic Programming With Nonlinear Safety Constraints Under System Uncertainties PDF

[67] Robust tracking with model mismatch for fast and safe planning: an sos optimization approach PDF

[68] Optimal robust control for constrained nonlinear hybrid systems with application to bipedal locomotion PDF

[69] Residual Policy Optimization with Trust Region Constraints: A Learning Framework for Stable and Agile Wheel-Legged Locomotion PDF

Hierarchical policy architecture with dynamic switching mechanism

[55] Hierarchical Reinforcement Learning-Based Policy Switching Towards Multi-Scenarios Autonomous Driving PDF

[57] Hierarchical Control for Robust Standing Stability and Fall Recovery of Task-Performing Humanoid Robots PDF

[51] Multi-Agent Fuzzy Reinforcement Learning With LLM for Cooperative Navigation of Endovascular Robotics PDF

[52] Hierarchical control framework for path planning of mobile robots in dynamic environments through global guidance and reinforcement learning PDF

[53] Efficient Adaptive Multi-Level Privilege Partitioning With RTrustSoC PDF

[54] Adaptive Security Policy Modeling for Intelligent Financial Systems PDF

[56] Fuzzy multi-level security: An experiment on quantified risk-adaptive access control PDF

[58] Hierarchical approach to adaptive control for improved flight safety PDF

[59] LLM-Based Generalizable Hierarchical Task Planning and Execution for Heterogeneous Robot Teams with Event-Driven Replanning PDF

[60] Understanding the Process and Mechanisms of Sustainable Mobility Transitions in Dutch citiesâAn analysis using Multi-level Perspective Framework PDF

Table of Contents

[60] Understanding the Process and Mechanisms of Sustainable Mobility Transitions in Dutch citiesâAn analysis using Multi-level Perspective Framework PDF