论文标题
用于乘法噪声输出反馈控制的政策迭代
Policy Iteration for Multiplicative Noise Output Feedback Control
论文作者
论文摘要
我们提出了一种策略迭代算法,用于解决乘法噪声线性二次输出反馈设计问题。该算法求解了一组耦合的riccati方程,以在一类线性动态控制策略下部分可观察到的马尔可夫决策过程(POMDP)产生的估计和控制。我们在数值实验中显示的收敛速度要比值迭代算法快得多,该算法算法是唯一已知的解决此类问题的算法。结果表明,在更通用的POMDP中,有希望的未来研究方向对策略优化算法进行了有希望的研究方向,包括当不可用的模型参数时开发新型近似数据驱动方法的潜力。
We propose a policy iteration algorithm for solving the multiplicative noise linear quadratic output feedback design problem. The algorithm solves a set of coupled Riccati equations for estimation and control arising from a partially observable Markov decision process (POMDP) under a class of linear dynamic control policies. We show in numerical experiments far faster convergence than a value iteration algorithm, formerly the only known algorithm for solving this class of problem. The results suggest promising future research directions for policy optimization algorithms in more general POMDPs, including the potential to develop novel approximate data-driven approaches when model parameters are not available.