任何时间保证马尔可夫决策过程中的可达到性

论文标题

任何时间保证马尔可夫决策过程中的可达到性

Anytime Guarantees for Reachability in Uncountable Markov Decision Processes

论文作者

Grover, Kush, Křetínský, Jan, Meggendorfer, Tobias, Weininger, Maximilian

论文摘要

我们考虑了近似Markov决策过程（MDP）中具有不可数（连续）状态和动作空间的可及性概率的问题。尽管有一些算法，对于此类MDP的特殊类别，提供了一系列近似值，将近似值融合到限制中的真实值，但我们的目的是获得具有近似精度保证的算法。由于这个问题通常是不确定的，因此需要对MDP的假设。我们的主要贡献是确定足够的假设，这些假设尽可能弱，从而接近可以正确和可靠地分析系统的“边界”。为此，我们还争辩说，为什么我们的每个假设对于基于处理有限的许多观察结果来说都是必要的。我们提出两个解决方案变体。与以前有关保证的典型作品相比，第一个在较弱的假设下提供了融合的下限。然后，第二个利用更强的假设来提供收敛的上限。总的来说，我们获得任何时间算法，即产生一系列具有已知且迭代改善精度的近似序列，并在极限中收敛到真实值。此外，由于我们的假设的普遍性，我们的算法是非常通用的模板，很容易允许文献中的各种启发式方法，例如，特定的离散化算法。因此，我们的理论贡献为未来的实际改进铺平了道路，而无需牺牲正确性保证。

We consider the problem of approximating the reachability probabilities in Markov decision processes (MDP) with uncountable (continuous) state and action spaces. While there are algorithms that, for special classes of such MDP, provide a sequence of approximations converging to the true value in the limit, our aim is to obtain an algorithm with guarantees on the precision of the approximation. As this problem is undecidable in general, assumptions on the MDP are necessary. Our main contribution is to identify sufficient assumptions that are as weak as possible, thus approaching the "boundary" of which systems can be correctly and reliably analyzed. To this end, we also argue why each of our assumptions is necessary for algorithms based on processing finitely many observations. We present two solution variants. The first one provides converging lower bounds under weaker assumptions than typical ones from previous works concerned with guarantees. The second one then utilizes stronger assumptions to additionally provide converging upper bounds. Altogether, we obtain an anytime algorithm, i.e. yielding a sequence of approximants with known and iteratively improving precision, converging to the true value in the limit. Besides, due to the generality of our assumptions, our algorithms are very general templates, readily allowing for various heuristics from literature in contrast to, e.g., a specific discretization algorithm. Our theoretical contribution thus paves the way for future practical improvements without sacrificing correctness guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题