论文标题
何时停止价值迭代:稳定性和近距离与计算
When to stop value iteration: stability and near-optimality versus computation
论文作者
论文摘要
价值迭代(VI)是一种无处不在的算法,用于最佳控制,计划和强化学习方案。在正确的假设下,VI是生成具有理想属性的输入的至关重要的工具,例如最优性和Lyapunov稳定性。由于VI通常需要无限数量的迭代来解决一般的非线性最佳控制问题,因此一个关键问题是何时终止该算法以产生“良好”解决方案,并对最佳和稳定性保证产生可衡量的影响。通过在一般的稳定性和可检测性属性下仔细分析VI,我们提供了停止标准对近距离,稳定性,稳定性和性能的影响的明确和新的关系,从而使这些理想的属性与诱发的计算成本相调整。所考虑的一类停止标准涵盖了控制,动态编程和强化学习文献中遇到的那些标准,并且允许考虑新的标准,这对于进一步降低计算成本可能是有用的,同时又可以降低计算成本,同时又可以满足稳定性以及满足近距离的属性。因此,我们为基于VI的机器学习方案奠定了基础,并具有稳定性和性能保证,同时降低了计算复杂性。
Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like optimality and Lyapunov stability. As VI usually requires an infinite number of iterations to solve general nonlinear optimal control problems, a key question is when to terminate the algorithm to produce a "good" solution, with a measurable impact on optimality and stability guarantees. By carefully analysing VI under general stabilizability and detectability properties, we provide explicit and novel relationships of the stopping criterion's impact on near-optimality, stability and performance, thus allowing to tune these desirable properties against the induced computational cost. The considered class of stopping criteria encompasses those encountered in the control, dynamic programming and reinforcement learning literature and it allows considering new ones, which may be useful to further reduce the computational cost while endowing and satisfying stability and near-optimality properties. We therefore lay a foundation to endow machine learning schemes based on VI with stability and performance guarantees, while reducing computational complexity.