图神经诱导价值迭代

论文标题

图神经诱导价值迭代

Graph neural induction of value iteration

论文作者

Deac, Andreea, Bacon, Pierre-Luc, Tang, Jian

论文摘要

许多强化学习任务可以从基于环境内部模型的明确规划中受益。以前，此类计划组件已经通过神经网络合并，该神经网络与价值迭代的计算图部分一致。到目前为止，此类网络一直集中在限制性环境（例如网格世界）上，并仅间接地对计划程序进行建模。我们放宽了这些约束，提出了一个图形神经网络（GNN），该图形在任意环境模型上执行值迭代（VI）算法，并在VI的中间步骤上进行直接监督。结果表明，GNN能够准确地对价值迭代建模，从而在各种分布测试中恢复有利的指标和策略。这表明，具有强大监督的GNN执行者是深度强化学习系统中的一个可行组件。

Many reinforcement learning tasks can benefit from explicit planning based on an internal model of the environment. Previously, such planning components have been incorporated through a neural network that partially aligns with the computational graph of value iteration. Such network have so far been focused on restrictive environments (e.g. grid-worlds), and modelled the planning procedure only indirectly. We relax these constraints, proposing a graph neural network (GNN) that executes the value iteration (VI) algorithm, across arbitrary environment models, with direct supervision on the intermediate steps of VI. The results indicate that GNNs are able to model value iteration accurately, recovering favourable metrics and policies across a variety of out-of-distribution tests. This suggests that GNN executors with strong supervision are a viable component within deep reinforcement learning systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题