学习不重建的强化学习的不变表示形式

论文标题

学习不重建的强化学习的不变表示形式

Learning Invariant Representations for Reinforcement Learning without Reconstruction

论文作者

Zhang, Amy, McAllister, Rowan, Calandra, Roberto, Gal, Yarin, Levine, Sergey

论文摘要

我们研究表示学习如何从诸如图像之类的丰富观察结果中加速增强学习，而无需依赖域知识或像素重建。我们的目标是学习提供有效的下游控制和对任务无关细节的不变性的表示形式。仿真指标量化了连续MDP中状态之间的行为相似性，我们建议它用于学习强大的潜在表示，这些表示仅编码与任务相关的信息。我们的方法训练编码，使潜在空间中的距离等于状态空间中的距离距离。我们证明了我们使用修改后的视觉穆约可乐任务忽略任务 - 息肉信息的有效性，其中背景被移动的干扰器和自然视频所取代，同时实现了SOTA性能。我们还测试了一项第一人称高速公路驾驶任务，我们的方法学习了对云，天气和一天中的时间的不变性。最后，我们提供了概括的结果，该结果来自分别仿真指标的性质，以及与因果推理的联系。

We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction. Our goal is to learn representations that both provide for effective downstream control and invariance to task-irrelevant details. Bisimulation metrics quantify behavioral similarity between states in continuous MDPs, which we propose using to learn robust latent representations which encode only the task-relevant information from observations. Our method trains encoders such that distances in latent space equal bisimulation distances in state space. We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks, where the background is replaced with moving distractors and natural videos, while achieving SOTA performance. We also test a first-person highway driving task where our method learns invariance to clouds, weather, and time of day. Finally, we provide generalization results drawn from properties of bisimulation metrics, and links to causal inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题