GLIB：通过目标文字bablbling进行基于关系模型的增强学习的有效探索

论文标题

GLIB：通过目标文字bablbling进行基于关系模型的增强学习的有效探索

GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling

论文作者

Chitnis, Rohan, Silver, Tom, Tenenbaum, Joshua, Kaelbling, Leslie Pack, Lozano-Perez, Tomas

论文摘要

我们解决了基于关系模型的强化学习设置中的过渡模型学习的有效探索问题，而无需外部目标或奖励。受到人类好奇心的启发，我们提出了目标文字Babbling（GLIB），这是在此类问题中进行探索的一种简单而通用的方法。 GLIB样本的关系结合目标可以理解为代理商希望在世界上实现的特定，有针对性的效果，并计划使用所学的过渡模型来实现这些目标。我们提供了理论保证，表明使用GLIB的探索几乎肯定会融合到地面真相模型。在实验上，我们发现GLIB在预测和计划中都在一系列任务中强烈胜过现有的方法，其中包括标准PDDL和PPDDL计划基准以及在Pybullet Physics Mimulator中实现的机器人操纵任务。视频：https：//youtu.be/f6lmrpt6toy代码：https：//git.io/jistb

We address the problem of efficient exploration for transition model learning in the relational model-based reinforcement learning setting without extrinsic goals or rewards. Inspired by human curiosity, we propose goal-literal babbling (GLIB), a simple and general method for exploration in such problems. GLIB samples relational conjunctive goals that can be understood as specific, targeted effects that the agent would like to achieve in the world, and plans to achieve these goals using the transition model being learned. We provide theoretical guarantees showing that exploration with GLIB will converge almost surely to the ground truth model. Experimentally, we find GLIB to strongly outperform existing methods in both prediction and planning on a range of tasks, encompassing standard PDDL and PPDDL planning benchmarks and a robotic manipulation task implemented in the PyBullet physics simulator. Video: https://youtu.be/F6lmrPT6TOY Code: https://git.io/JIsTB

下载PDF全文

下载文献需遵守相关版权规定

论文标题