用于在线损害恢复的分层质量多样性

论文标题

用于在线损害恢复的分层质量多样性

Hierarchical Quality-Diversity for Online Damage Recovery

论文作者

Allard, Maxime, Smith, Simón C., Chatzilygeroudis, Konstantinos, Cully, Antoine

论文摘要

适应能力（例如损害恢复）对于在复杂环境中的机器人部署至关重要。几项工作表明，使用预训练技能的曲目可以使机器人在几分钟内适应不可预见的机械损失。这些适应能力与曲目中的行为多样性直接相关。机器人必须执行技能的替代方案越多，它可以适应新情况的机会就越好。但是，解决复杂的任务（例如迷宫导航）通常需要多种不同的技能。为这些多种技能找到庞大的行为多样性通常会导致所需解决方案数量的棘手指数增长。在本文中，我们介绍了层次结构的反复试验算法，该算法使用层次的行为曲目来学习多样化的技能，并利用它们使机器人更适应不同情况。我们表明，技能的层次结构分解使机器人能够学习更多复杂的行为，同时保持曲目的学习。使用Hexapod机器人进行的实验表明，我们的方法在最具挑战性的情况下，动作少20％而解决了迷宫导航任务，而完全失败却少了57％。

Adaptation capabilities, like damage recovery, are crucial for the deployment of robots in complex environments. Several works have demonstrated that using repertoires of pre-trained skills can enable robots to adapt to unforeseen mechanical damages in a few minutes. These adaptation capabilities are directly linked to the behavioural diversity in the repertoire. The more alternatives the robot has to execute a skill, the better are the chances that it can adapt to a new situation. However, solving complex tasks, like maze navigation, usually requires multiple different skills. Finding a large behavioural diversity for these multiple skills often leads to an intractable exponential growth of the number of required solutions. In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot more adaptive to different situations. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. The experiments with a hexapod robot show that our method solves maze navigation tasks with 20% less actions in the most challenging scenarios than the best baseline while having 57% less complete failures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题