论文标题
一个分段学习框架,用于控制具有稳定性的未知非线性系统
A Piecewise Learning Framework for Control of Unknown Nonlinear Systems with Stability Guarantees
论文作者
论文摘要
我们提出了一个分段学习框架,用于控制具有未知动态的非线性系统。尽管在文献中,基于模型的增强学习技术在某些基础函数方面是众所周知的,但在更复杂的动态方面,只能使用有限数量的碱基获得模型的局部近似。如果需要在较大的域上获得近似值,则标识符和控制器的复杂性可能很高。为了克服这一限制,我们提出了一个通用的分段非线性框架,其中每件作品负责本地学习和控制域的某些区域。我们为学习的分段模型获得了严格的不确定性界限。然后将分段仿射(PWA)模型作为一种特殊情况进行了研究,为此我们提出了一种基于优化的验证技术,用于闭环系统的稳定性分析。因此,考虑到学到的{PWA}系统的时间消化,我们在一组正定的确定函数中迭代地搜索一个常见的分段Lyapunov函数,其中允许非单调收敛。该Lyapunov候选人在不确定的系统上进行了验证,以提供稳定性证书或在失败时找到反例。将此反示例添加到一组样本中,以促进Lyapunov函数的进一步学习。我们在两个示例上证明了结果,并表明所提出的方法与替代性最先进的方法相比产生了不太保守的吸引力区域(ROA)。此外,我们提供运行时结果,以证明在现实世界实现中提出的框架的潜力。
We propose a piecewise learning framework for controlling nonlinear systems with unknown dynamics. While model-based reinforcement learning techniques in terms of some basis functions are well known in the literature, when it comes to more complex dynamics, only a local approximation of the model can be obtained using a limited number of bases. The complexity of the identifier and the controller can be considerably high if obtaining an approximation over a larger domain is desired. To overcome this limitation, we propose a general piecewise nonlinear framework where each piece is responsible for locally learning and controlling over some region of the domain. We obtain rigorous uncertainty bounds for the learned piecewise models. The piecewise affine (PWA) model is then studied as a special case, for which we propose an optimization-based verification technique for stability analysis of the closed-loop system. Accordingly, given a time-discretization of the learned {PWA} system, we iteratively search for a common piecewise Lyapunov function in a set of positive definite functions, where a non-monotonic convergence is allowed. This Lyapunov candidate is verified on the uncertain system to either provide a certificate for stability or find a counter-example when it fails. This counter-example is added to a set of samples to facilitate the further learning of a Lyapunov function. We demonstrate the results on two examples and show that the proposed approach yields a less conservative region of attraction (ROA) compared with alternative state-of-the-art approaches. Moreover, we provide the runtime results to demonstrate potentials of the proposed framework in real-world implementations.