合适的关系域的Q学习

论文标题

合适的关系域的Q学习

Fitted Q-Learning for Relational Domains

论文作者

Das, Srijita, Natarajan, Sriraam, Roy, Kaushik, Parr, Ronald, Kersting, Kristian

论文摘要

我们考虑关系域中近似动态编程的问题。受到构图设置中拟合的Q学习方法成功的启发，我们通过表示值函数和钟声残差来开发第一个关系拟合的Q学习算法。当我们适合Q功能时，我们将展示贝尔曼操作员的两个步骤；可以使用梯度增强技术执行应用程序和投影步骤。我们提出的框架在不使用域模型并使用较少训练轨迹的情况下在标准域上表现出色。

We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing the value function and Bellman residuals. When we fit the Q-functions, we show how the two steps of Bellman operator; application and projection steps can be performed using a gradient-boosting technique. Our proposed framework performs reasonably well on standard domains without using domain models and using fewer training trajectories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题