Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

1Shanghai Artificial Intelligence Laboratory, 2Tsinghua University, 3Shanghai Jiao Tong University, 4Tencent Robotics X, 5Northwestern Polytechnical University *Corresponding Author
ICRA 2024

Abstract

The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion.

Video

Method

Overall framework of our method. The critic network estimates the value distribution, and the risk-averse policy is obtained by optimizing the Conditional Value-at- Risk(CVaR) objective. The policy is supposed to perform well under worst-case scenarios.


Experiments

We train the policy in Issac Gym simulator and deploy on a real Unitree Aliengo robot. The policy could generalize to scenarios it has never seen before, such as strong push and heavy loads.

External Push

We hit the robot with a 5kg ball and kick it in various directions.

Missing a Step

The robot could recover from missing a step successfully, even when walking down a 30cm platform.

Carrying Static Loads

The resulted policy is able to carry a 4kg robot arm without any modification of the training process.

Carrying Dynamics Loads

We have the robot carry a box containing a 3kg iron ball, which will hit the box thus posing a greater challenge.

Pull a Leg

When the robot's leg suddenly get pulled, it could quickly recover balance from the risk.

Wild Terrain

We also conducted outdoor experiments. The robot could navigate soil slope and thick vegetation.

BibTeX

@inproceedings{shi2023robust,
      title={Robust Quadrupedal Locomotion via Risk-Averse Policy Learning}, 
      author={Shi, Jiyuan and Bai, Chenjia and He, Haoran and Han, Lei and Wang, Dong and Zhao, Bin and Zhao, Mingguo and Li, Xiu and Li, Xuelong},
      booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
      year={2024},
      organization={IEEE}
}