论文标题
通过对抗训练学习敏捷运动
Learning Agile Locomotion via Adversarial Training
论文作者
论文摘要
开发敏捷运动的控制器对于腿部机器人来说是一个长期的挑战。强化学习(RL)和进化策略(ES)具有自动化此类控制器设计过程的承诺。但是,为设计培训环境促进敏捷性需要专门和谨慎的人为努力。在本文中,我们介绍了一个多机构学习系统,其中四倍的机器人(主角)学会了追逐另一个机器人(对手),而后者则学会逃脱。我们发现,这种对抗性训练过程不仅鼓励敏捷行为,而且可以有效地减轻了艰苦的环境设计工作。与仅使用一个对手的先前作品相反,我们发现训练一个对手合奏,每个对手专门从事不同的逃避策略,对于主角敏捷性是必不可少的。通过广泛的实验,我们表明,通过对抗训练所学的运动控制器显着优于精心设计的基准。
Developing controllers for agile locomotion is a long-standing challenge for legged robots. Reinforcement learning (RL) and Evolution Strategy (ES) hold the promise of automating the design process of such controllers. However, dedicated and careful human effort is required to design training environments to promote agility. In this paper, we present a multi-agent learning system, in which a quadruped robot (protagonist) learns to chase another robot (adversary) while the latter learns to escape. We find that this adversarial training process not only encourages agile behaviors but also effectively alleviates the laborious environment design effort. In contrast to prior works that used only one adversary, we find that training an ensemble of adversaries, each of which specializes in a different escaping strategy, is essential for the protagonist to master agility. Through extensive experiments, we show that the locomotion controller learned with adversarial training significantly outperforms carefully designed baselines.