论文标题

编舞:学习和适应能力的想象力

Choreographer: Learning and Adapting Skills in Imagination

论文作者

Mazzaglia, Pietro, Verbelen, Tim, Dhoedt, Bart, Lacoste, Alexandre, Rajeswar, Sai

论文摘要

无监督的技能学习旨在在没有外部监督的情况下学习丰富的行为曲目,从而为人工代理提供控制和影响环境的能力。但是,如果没有适当的知识和探索,技能只能提供对环境限制区域的控制,从而限制其适用性。此外,尚不清楚如何利用学习的技能行为以数据有效的方式适应下游任务。我们提出了编舞家,这是一种基于模型的代理商,它利用其世界模型来学习和适应想象力的技能。我们的方法解开了探索和技能学习过程,能够在模型的潜在状态空间中发现技能。在适应过程中,代理使用元控制器通过并行部署的想象力来评估和适应学到的技能。编舞者能够通过离线数据学习技能,并通过探索政策同时收集数据。正如我们在URL基准中所示,这些技能可用于有效适应下游任务,在这里,我们在该任务中都超越了以前的像素和状态输入的方法。博学的技能还可以彻底探索环境,发现稀疏奖励更频繁,如DMC Suite和Meta-World的目标任务所示。网站和代码:https://skillchoreographer.github.io/

Unsupervised skill learning aims to learn a rich repertoire of behaviors without external supervision, providing artificial agents with the ability to control and influence the environment. However, without appropriate knowledge and exploration, skills may provide control only over a restricted area of the environment, limiting their applicability. Furthermore, it is unclear how to leverage the learned skill behaviors for adapting to downstream tasks in a data-efficient manner. We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. During adaptation, the agent uses a meta-controller to evaluate and adapt the learned skills efficiently by deploying them in parallel in imagination. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy. The skills can be used to effectively adapt to downstream tasks, as we show in the URL benchmark, where we outperform previous approaches from both pixels and states inputs. The learned skills also explore the environment thoroughly, finding sparse rewards more frequently, as shown in goal-reaching tasks from the DMC Suite and Meta-World. Website and code: https://skillchoreographer.github.io/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源