通过不精确马尔可夫决策过程近似欧几里得人

论文标题

通过不精确马尔可夫决策过程近似欧几里得人

Approximating Euclidean by Imprecise Markov Decision Processes

论文作者

Jaeger, Manfred, Bacci, Giorgio, Bacci, Giovanni, Larsen, Kim Guldstrand, Jensen, Peter Gjøl

论文摘要

欧几里得马尔可夫决策过程是对连续域的不确定性下的控制问题进行建模的强大工具。有限状态不精确，马尔可夫决策过程可用于近似这些无限模型的行为。在本文中，我们解决了两个问题：首先，我们研究当欧几里得过程通过越来越细的状态空间的有限状态近似值近似时，获得了什么样的近似保证。我们表明，对于有限时间范围内的成本函数，近似值变得任意精确。其次，我们使用不精确的马尔可夫决策过程近似值作为分析和验证通过增强学习获得的成本功能和策略的工具。我们发现，一方面，我们的新理论结果验证了先前建议的强化学习方法的基本设计选择。另一方面，马尔可夫决策过程近似不精确地揭示了学习成本函数的某些不准确性。

Euclidean Markov decision processes are a powerful tool for modeling control problems under uncertainty over continuous domains. Finite state imprecise, Markov decision processes can be used to approximate the behavior of these infinite models. In this paper we address two questions: first, we investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations induced by increasingly fine partitions of the continuous state space. We show that for cost functions over finite time horizons the approximations become arbitrarily precise. Second, we use imprecise Markov decision process approximations as a tool to analyse and validate cost functions and strategies obtained by reinforcement learning. We find that, on the one hand, our new theoretical results validate basic design choices of a previously proposed reinforcement learning approach. On the other hand, the imprecise Markov decision process approximations reveal some inaccuracies in the learned cost functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题