使用选项指数的分层增强学习将选项与任务匹配到任务

论文标题

使用选项指数的分层增强学习将选项与任务匹配到任务

Matching options to tasks using Option-Indexed Hierarchical Reinforcement Learning

论文作者

Chauhan, Kushal, Chatterjee, Soumya, Reddy, Akash, Ravindran, Balaraman, Shenoy, Pradeep

论文摘要

分层增强学习中的选项框架将总体目标分解为选项或更简单的任务和相关策略的组合，从而可以在动作领域进行抽象。理想情况下，可以在不同的高级目标中重复使用这些选择。确实，这种重用是实现可以有效利用其先前经验的持续学习代理人的愿景所必需的。先前的方法仅提出了将预科选项转移到新任务设置的有限形式。我们提出了一种新颖的选项索引方法，用于分层学习（OI-HRL），在该方法中，我们学习了选项与环境中存在的项目之间的亲和力功能。这使我们能够通过限制目标指导的学习仅限于与手头任务相关的那些选项，从而有效地以零拍的概括来有效地重复一个验证的选项库。我们开发了一个元训练循环，该循环通过结合有关检索期权与高级目标的相关性的反馈来了解一系列HRL问题的选项和环境的表示。我们在两个模拟设置（即Craftworld and AI2THOR环境）中评估OI -HRL，并表明我们可以与Oracular Baseline实现绩效竞争，并且比基线的实质性取得了可观的增长，该基线具有可用于学习层次结构政策的整个选项池。

The options framework in Hierarchical Reinforcement Learning breaks down overall goals into a combination of options or simpler tasks and associated policies, allowing for abstraction in the action space. Ideally, these options can be reused across different higher-level goals; indeed, such reuse is necessary to realize the vision of a continual learning agent that can effectively leverage its prior experience. Previous approaches have only proposed limited forms of transfer of prelearned options to new task settings. We propose a novel option indexing approach to hierarchical learning (OI-HRL), where we learn an affinity function between options and the items present in the environment. This allows us to effectively reuse a large library of pretrained options, in zero-shot generalization at test time, by restricting goal-directed learning to only those options relevant to the task at hand. We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems, by incorporating feedback about the relevance of retrieved options to the higher-level goal. We evaluate OI-HRL in two simulated settings - the CraftWorld and AI2THOR environments - and show that we achieve performance competitive with oracular baselines, and substantial gains over a baseline that has the entire option pool available for learning the hierarchical policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题