同时更新强化学习中的所有持久性值

论文标题

同时更新强化学习中的所有持久性值

Simultaneously Updating All Persistence Values in Reinforcement Learning

论文作者

Sabbioni, Luca, Daire, Luca Al, Bisi, Lorenzo, Metelli, Alberto Maria, Restelli, Marcello

论文摘要

在强化学习中，学习剂的性能对时间离散化的选择高度敏感。高频行动的特工具有最佳的控制机会，以及一些缺点，例如效率低下的探索和动作优势的消失。行动的重复，即行动持久性，可以帮助您，因为它允许代理商访问状态空间的更广泛区域并改善行动效应的估计。在这项工作中，我们得出了一种新颖的全持障碍贝尔曼操作员，该操作员可以通过分解为子过渡和高渗透体验，从而有效地利用了低渗透体验的经验，这要归功于引入合适的引导程序。这样，我们采用在任何时间范围内收集的过渡来同时更新所考虑的持久性集的动作值。我们证明了贝尔曼运营商的收缩属性，并基于它扩展了经典Q学习和DQN。在提供了有关持久性影响的研究之后，我们在表格环境和更具挑战性的框架（包括一些Atari游戏）中对我们的方法进行了实验评估。

In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization. Agents acting at high frequencies have the best control opportunities, along with some drawbacks, such as possible inefficient exploration and vanishing of the action advantages. The repetition of the actions, i.e., action persistence, comes into help, as it allows the agent to visit wider regions of the state space and improve the estimation of the action effects. In this work, we derive a novel All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience, by decomposition into sub-transition, and the high-persistence experience, thanks to the introduction of a suitable bootstrap procedure. In this way, we employ transitions collected at any time scale to update simultaneously the action values of the considered persistence set. We prove the contraction property of the All-Persistence Bellman Operator and, based on it, we extend classic Q-learning and DQN. After providing a study on the effects of persistence, we experimentally evaluate our approach in both tabular contexts and more challenging frameworks, including some Atari games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题