External Push
We hit the robot with a 5kg ball and kick it in various directions.
Overall framework of our method. The critic network estimates the value distribution, and the risk-averse policy is obtained by optimizing the Conditional Value-at- Risk(CVaR) objective. The policy is supposed to perform well under worst-case scenarios.
We train the policy in Issac Gym simulator and deploy on a real Unitree Aliengo robot. The policy could generalize to scenarios it has never seen before, such as strong push and heavy loads.
We hit the robot with a 5kg ball and kick it in various directions.
The robot could recover from missing a step successfully, even when walking down a 30cm platform.
The resulted policy is able to carry a 4kg robot arm without any modification of the training process.
We have the robot carry a box containing a 3kg iron ball, which will hit the box thus posing a greater challenge.
When the robot's leg suddenly get pulled, it could quickly recover balance from the risk.
We also conducted outdoor experiments. The robot could navigate soil slope and thick vegetation.
@inproceedings{shi2023robust,
title={Robust Quadrupedal Locomotion via Risk-Averse Policy Learning},
author={Shi, Jiyuan and Bai, Chenjia and He, Haoran and Han, Lei and Wang, Dong and Zhao, Bin and Zhao, Mingguo and Li, Xiu and Li, Xuelong},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
year={2024},
organization={IEEE}
}