切换时学习和控制：有保证的稳定性和额定性遗憾

论文标题

切换时学习和控制：有保证的稳定性和额定性遗憾

Learn and Control while Switching: with Guaranteed Stability and Sublinear Regret

论文作者

Chekan, Jafar Abbaszadeh, Langbort, Cédric

论文摘要

过度驱动的系统通常可以通过在执行器的不同子集之间切换来实现特定的性能。但是，当系统参数未知时，由于稳定性和性能效率的关注，将权威转移到执行器的不同子集都具有挑战性。本文提出了一种有效的算法来解决线性二次设置（LQ）设置中所谓的“学习和控制”问题。面对不确定性（OFU）的算法，我们提出的策略是基于乐观的，该算法配备了投影工具箱，以保持算法有效，遗憾。一路上，由于存在一个稳定邻居，我们在热身阶段得出了最佳的持续时间。开关系统的稳定性也可以通过设计最低平均停留时间来保证。事实证明，提出的策略具有$ \ MATHCAL {\ bar {o}} \ big（\ sqrt {\ sqrt {t} \ big）+\ Mathcal {o} \ big（ns \ sqrt {t}} \ big big（ns \ sqrt {t} \ big）$ in hizy $ t $ biginal nati n n ns $ ns $ na n.ns $ na na n. 算法。

Over-actuated systems often make it possible to achieve specific performances by switching between different subsets of actuators. However, when the system parameters are unknown, transferring authority to different subsets of actuators is challenging due to stability and performance efficiency concerns. This paper presents an efficient algorithm to tackle the so-called "learn and control while switching between different actuating modes" problem in the Linear Quadratic (LQ) setting. Our proposed strategy is constructed upon Optimism in the Face of Uncertainty (OFU) based algorithm equipped with a projection toolbox to keep the algorithm efficient, regret-wise. Along the way, we derive an optimum duration for the warm-up phase, thanks to the existence of a stabilizing neighborhood. The stability of the switched system is also guaranteed by designing a minimum average dwell time. The proposed strategy is proved to have a regret bound of $\mathcal{\bar{O}}\big(\sqrt{T}\big)+\mathcal{O}\big(ns\sqrt{T}\big)$ in horizon $T$ with $(ns)$ number of switches, provably outperforming naively applying the basic OFU algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题