一个接一个：学习不断变化的世界的渐进技能

论文标题

一个接一个：学习不断变化的世界的渐进技能

One After Another: Learning Incremental Skills for a Changing World

论文作者

Shafiullah, Nur Muhammad, Pinto, Lerrel

论文摘要

在任务监督稀缺或昂贵的环境中，无奖励，无监督的技能发现是一种有吸引力的替代手工设计奖励的替代品。但是，像许多RL技术一样，当前的技能预训练方法在训练过程中是基本的假设 - 固定环境。传统方法同时学习了所有技能，这使得他们很难迅速适应环境的变化，并且在这种适应后不要忘记早期的技能。另一方面，在不断发展或不断扩展的环境中，技能学习必须能够快速适应新的环境情况，而不会忘记以前学习的技能。这两个条件使经典技能发现很难在不断发展的环境中做得很好。在这项工作中，我们为技能发现提出了一个新的框架，在该框架中，技能以渐进的方式学习。该框架允许新学习的技能适应新的环境或代理动力，而固定的旧技能则确保代理商不会忘记学习的技能。我们在实验上证明，在不断发展的环境和静态环境中，逐步技能在技能质量和解决下游任务的能力方面都显着优于当前最新技能发现方法。学习技能和代码的视频在https://notmahi.github.io/disk上公开。

Reward-free, unsupervised discovery of skills is an attractive alternative to the bottleneck of hand-designing rewards in environments where task supervision is scarce or expensive. However, current skill pre-training methods, like many RL techniques, make a fundamental assumption - stationary environments during training. Traditional methods learn all their skills simultaneously, which makes it difficult for them to both quickly adapt to changes in the environment, and to not forget earlier skills after such adaptation. On the other hand, in an evolving or expanding environment, skill learning must be able to adapt fast to new environment situations while not forgetting previously learned skills. These two conditions make it difficult for classic skill discovery to do well in an evolving environment. In this work, we propose a new framework for skill discovery, where skills are learned one after another in an incremental fashion. This framework allows newly learned skills to adapt to new environment or agent dynamics, while the fixed old skills ensure the agent doesn't forget a learned skill. We demonstrate experimentally that in both evolving and static environments, incremental skills significantly outperform current state-of-the-art skill discovery methods on both skill quality and the ability to solve downstream tasks. Videos for learned skills and code are made public on https://notmahi.github.io/disk

下载PDF全文

下载文献需遵守相关版权规定

论文标题