论文标题
紧凑的限制度量政策优化问题
Compactly Restrictable Metric Policy Optimization Problems
论文作者
论文摘要
我们研究具有度量状态和行动空间的确定性马尔可夫决策过程(MDP)的政策优化问题,我们称为公制策略优化问题(MPOPS)。我们的目标是建立有关MPOP的适当性的理论结果,这些结果可以表征实际相关的连续控制系统。为此,我们定义了一种称为紧凑型MPOP(CR MPOPS)的特殊类MPOP,它们足够灵活,可以捕获机器人系统的复杂行为,但特定于使用动态编程方法(例如Value Etateration)允许解决方案。我们展示了如何使用前向不变性到达CR-MPOP。我们进一步表明,我们对CR-MPOP的理论结果可用于表征反馈可线化的控制仿射系统。
We study policy optimization problems for deterministic Markov decision processes (MDPs) with metric state and action spaces, which we refer to as Metric Policy Optimization Problems (MPOPs). Our goal is to establish theoretical results on the well-posedness of MPOPs that can characterize practically relevant continuous control systems. To do so, we define a special class of MPOPs called Compactly Restrictable MPOPs (CR-MPOPs), which are flexible enough to capture the complex behavior of robotic systems but specific enough to admit solutions using dynamic programming methods such as value iteration. We show how to arrive at CR-MPOPs using forward-invariance. We further show that our theoretical results on CR-MPOPs can be used to characterize feedback linearizable control affine systems.