还原MDP：暂时范围之外的视角

论文标题

还原MDP：暂时范围之外的视角

Reductive MDPs: A Perspective Beyond Temporal Horizons

论文作者

Spooner, Thomas, Silva, Rui, Lockhart, Joshua, Long, Jason, Glukhov, Vacslav

论文摘要

解决马尔可夫决策过程（MDP）是一个计算上的困难问题。另一方面，通过众所周知的多项式时间算法，解决有限马MDP是高度拖动的。是什么驱动了这种极端差异，并且存在这些截然相反的复杂性之间存在的问题吗？在本文中，我们确定并分析了一个随机路径问题（SSP）的子类别的一般状态行动空间，其动力学满足特定的漂移条件。这种结构通过降低可达性：一种称为降低的特性来概括地平线的传统时间概念。结果表明，通过向后诱导的扩展，可以在多项式时间中恢复最佳策略，并在还原性MDP中有效地模拟。讨论了所提出的方法的实际考虑，并在规范的最佳清算问题上提供了数值验证。

Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems (SSPs) for general state-action spaces whose dynamics satisfy a particular drift condition. This construction generalises the traditional, temporal notion of a horizon via decreasing reachability: a property called reductivity. It is shown that optimal policies can be recovered in polynomial-time for reductive SSPs -- via an extension of backwards induction -- with an efficient analogue in reductive MDPs. The practical considerations of the proposed approach are discussed, and numerical verification provided on a canonical optimal liquidation problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题