论文标题

还原MDP:暂时范围之外的视角

Reductive MDPs: A Perspective Beyond Temporal Horizons

论文作者

Spooner, Thomas, Silva, Rui, Lockhart, Joshua, Long, Jason, Glukhov, Vacslav

论文摘要

解决马尔可夫决策过程(MDP)是一个计算上的困难问题。另一方面,通过众所周知的多项式时间算法,解决有限马MDP是高度拖动的。是什么驱动了这种极端差异,并且存在这些截然相反的复杂性之间存在的问题吗?在本文中,我们确定并分析了一个随机路径问题(SSP)的子类别的一般状态行动空间,其动力学满足特定的漂移条件。这种结构通过降低可达性:一种称为降低的特性来概括地平线的传统时间概念。结果表明,通过向后诱导的扩展,可以在多项式时间中恢复最佳策略,并在还原性MDP中有效地模拟。讨论了所提出的方法的实际考虑,并在规范的最佳清算问题上提供了数值验证。

Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems (SSPs) for general state-action spaces whose dynamics satisfy a particular drift condition. This construction generalises the traditional, temporal notion of a horizon via decreasing reachability: a property called reductivity. It is shown that optimal policies can be recovered in polynomial-time for reductive SSPs -- via an extension of backwards induction -- with an efficient analogue in reductive MDPs. The practical considerations of the proposed approach are discussed, and numerical verification provided on a canonical optimal liquidation problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源