通过AI规划模型的分层增强学习

论文标题

通过AI规划模型的分层增强学习

Hierarchical Reinforcement Learning with AI Planning Models

论文作者

Lee, Junkyu, Katz, Michael, Agravante, Don Joven, Liu, Miao, Tasse, Geraud Nangue, Klinger, Tim, Sohrabi, Shirin

论文摘要

顺序决策的两种常见方法是AI计划（AIP）和增强学习（RL）。每个都有优点和劣势。 AIP是可解释的，易于与象征知识集成，并且通常是有效的，但需要前期逻辑域的规范，并且对噪声很敏感； RL仅需要奖励的规范，并且对噪声是强大的，但效率低下，不容易提供外部知识。我们提出了一种综合方法，将高级计划与RL结合在一起，保留可解释性，转移和效率，同时允许对低级计划行动进行强有力的学习。我们的方法通过在AI计划问题的状态过渡模型与马尔可夫决策过程（MDP）的抽象状态过渡系统之间建立对应关系，从AIP操作员定义了分层增强学习（HRL）的选项。通过添加内在奖励来鼓励MDP和AIP过渡模型之间的一致性来学习选项。我们通过比较Minigrid和N房间环境中RL和HRL算法的性能来证明我们的集成方法的好处，从而显示了我们的方法比现有方法的优势。

Two common approaches to sequential decision-making are AI planning (AIP) and reinforcement learning (RL). Each has strengths and weaknesses. AIP is interpretable, easy to integrate with symbolic knowledge, and often efficient, but requires an up-front logical domain specification and is sensitive to noise; RL only requires specification of rewards and is robust to noise but is sample inefficient and not easily supplied with external knowledge. We propose an integrative approach that combines high-level planning with RL, retaining interpretability, transfer, and efficiency, while allowing for robust learning of the lower-level plan actions. Our approach defines options in hierarchical reinforcement learning (HRL) from AIP operators by establishing a correspondence between the state transition model of AI planning problem and the abstract state transition system of a Markov Decision Process (MDP). Options are learned by adding intrinsic rewards to encourage consistency between the MDP and AIP transition models. We demonstrate the benefit of our integrated approach by comparing the performance of RL and HRL algorithms in both MiniGrid and N-rooms environments, showing the advantage of our method over the existing ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题