论文标题
可扩展的多任务模仿学习,并自主改进
Scalable Multi-Task Imitation Learning with Autonomous Improvement
论文作者
论文摘要
尽管机器人学习已经证明了使机器人能够自动获取新技能的有希望的结果,但在部署基于学习的系统方面的关键挑战是:获取足够的数据以使机器人有效地广泛概括。尤其是模仿学习一直是机器人学习的稳定而有力的方法,但非常依赖专家运营商来收集数据。在这项工作中,我们针对这一挑战,旨在建立一个模仿学习系统,该系统可以通过自主数据收集来不断改进,同时避免了明确使用强化学习,以保持稳定性,简单性和可扩展性的模仿性。为此,我们将自主改进的模仿问题置于多任务设置中。我们利用了这样的见解,即在多任务设置中,对一个任务的失败尝试可能代表了对另一个任务的成功尝试。这使我们能够利用机器人自己的试验作为示范机器人实际尝试的任务。使用多任务演示数据的初始数据集,该机器人自主收集试验,这些试验仅以稀疏的标记,并指示试验是否完成了任何有用的任务。然后,我们将试验嵌入到仅使用初始演示数据集的培训的学习潜在任务空间中,以在各种试验之间汲取相似性,从而使机器人能够实现对新任务的一声概括。与先前的模仿学习方法相反,我们的方法可以在稀疏的监督下自主收集数据,以持续改进,而与强化学习算法相反,我们的方法可以从稀疏的,任务无义的奖励信号中有效改进。
While robot learning has demonstrated promising results for enabling robots to automatically acquire new skills, a critical challenge in deploying learning-based systems is scale: acquiring enough data for the robot to effectively generalize broadly. Imitation learning, in particular, has remained a stable and powerful approach for robot learning, but critically relies on expert operators for data collection. In this work, we target this challenge, aiming to build an imitation learning system that can continuously improve through autonomous data collection, while simultaneously avoiding the explicit use of reinforcement learning, to maintain the stability, simplicity, and scalability of supervised imitation. To accomplish this, we cast the problem of imitation with autonomous improvement into a multi-task setting. We utilize the insight that, in a multi-task setting, a failed attempt at one task might represent a successful attempt at another task. This allows us to leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted. Using an initial dataset of multi-task demonstration data, the robot autonomously collects trials which are only sparsely labeled with a binary indication of whether the trial accomplished any useful task or not. We then embed the trials into a learned latent space of tasks, trained using only the initial demonstration dataset, to draw similarities between various trials, enabling the robot to achieve one-shot generalization to new tasks. In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement, and in contrast to reinforcement learning algorithms, our method can effectively improve from sparse, task-agnostic reward signals.