论文标题
Markov决策过程的平均成本最佳不平等,并具有普遍可衡量的政策
Average Cost Optimality Inequality for Markov Decision Processes with Borel Spaces and Universally Measurable Policies
论文作者
论文摘要
我们考虑具有Borel状态和行动空间以及普遍可衡量的政策的平均成本马尔可夫决策过程(MDP)。对于具有Lyapunov型稳定性特征的非负成本模型和无界的成本模型,我们引入了一组新条件,我们通过消失的折现因子方法证明了平均成本优化性不平等(ACOI)。与ACOI上的大多数现有结果不同,我们的结果不需要MDP上的任何紧凑性和连续性条件。取而代之的是,主要思想是使用Egoroff定理中断言的可测量函数的几乎均匀的连接序列。我们的条件是为了利用此属性而制定的。除其他外,我们要求对于每个州,在该州选定的动作子集,国家过渡随机内核通过有限的措施进行了主要的措施。我们将过渡内核的这种多数化特性与Egoroff定理相结合,以证明ACOI。
We consider average-cost Markov decision processes (MDPs) with Borel state and action spaces and universally measurable policies. For the nonnegative cost model and an unbounded cost model with a Lyapunov-type stability character, we introduce a set of new conditions under which we prove the average cost optimality inequality (ACOI) via the vanishing discount factor approach. Unlike most existing results on the ACOI, our result does not require any compactness and continuity conditions on the MDPs. Instead, the main idea is to use the almost-uniform-convergence property of a pointwise convergent sequence of measurable functions as asserted in Egoroff's theorem. Our conditions are formulated in order to exploit this property. Among others, we require that for each state, on selected subsets of actions at that state, the state transition stochastic kernel is majorized by finite measures. We combine this majorization property of the transition kernel with Egoroff's theorem to prove the ACOI.