国家代表性学习目标条件的强化学习

论文标题

国家代表性学习目标条件的强化学习

State Representation Learning for Goal-Conditioned Reinforcement Learning

论文作者

Steccanella, Lorenzo, Jonsson, Anders

论文摘要

本文为无奖励马尔可夫决策过程提供了一种新颖的状态表示。这个想法是以一种自我监督的方式学习一个嵌入式空间，其中嵌入状态对之间的距离与它们之间过渡所需的最小动作数量相对应。与以前的方法相比，我们的方法不需要任何领域知识，从离线学习和未标记的数据学习。我们展示了如何利用这种表示形式来学习目标条件的政策，提供了州和目标之间相似性的概念以及有用的启发式距离来指导计划和增强学习算法。最后，我们从经验上验证了我们在经典控制域和多目标环境中的方法，这表明我们的方法可以在大型和/或连续域中成功学习表示形式。

This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where distances between pairs of embedded states correspond to the minimum number of actions needed to transition between them. Compared to previous methods, our approach does not require any domain knowledge, learning from offline and unlabeled data. We show how this representation can be leveraged to learn goal-conditioned policies, providing a notion of similarity between states and goals and a useful heuristic distance to guide planning and reinforcement learning algorithms. Finally, we empirically validate our method in classic control domains and multi-goal environments, demonstrating that our method can successfully learn representations in large and/or continuous domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题