pi-ars：带有预测信息表示的加速演化学习的视觉 - 洛杉矶运动

论文标题

pi-ars：带有预测信息表示的加速演化学习的视觉 - 洛杉矶运动

PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

论文作者

Lee, Kuang-Huei, Nachum, Ofir, Zhang, Tingnan, Guadarrama, Sergio, Tan, Jie, Yu, Wenhao

论文摘要

进化策略（ES）算法由于其巨大的并行能力，简单的实现，有效的参数空间探索和快速训练时间，在训练复杂的机器人控制策略中显示出令人鼓舞的结果。但是，ES的关键局限性是其对包括现代神经网络体系结构在内的大容量模型的可扩展性。在这项工作中，我们开发了预测性信息增强随机搜索（PI-ARS），以通过利用表示表示学习的最新进步来减少ES的参数搜索空间来减轻此限制。也就是说，PI-ARS将基于梯度的表示技术，预测信息（PI）与无梯度的ES算法，增强随机搜索（ARS）结合在一起，以训练可以处理复杂的机器人感觉输入并处理高度非线性机器人动力学的策略。我们在一系列具有挑战性的视觉范围任务上评估了PI-ARS，其中四倍的机器人需要在不平坦的踏脚石，Quincuncial Pile和移动平台上行走，并完成室内导航任务。在所有任务中，与ARS基线相比，PI-ARS的学习效率和表现明显更好。我们通过证明学识渊博的政策可以成功地转移到真正的四倍机器人的机器人中，进一步验证了我们的算法，例如，在现实世界中的垫脚石环境上取得了100％的成功率，从而显着提高了先前的结果，从而实现了40％的成功。

Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success.

下载PDF全文

下载文献需遵守相关版权规定

论文标题