论文标题
通过$ \ sqrt {t} $遗憾学习分散的线性二次调节器
Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret
论文作者
论文摘要
我们提出了一种在线学习算法,当系统模型未知时,可以自适应地设计一个分散的线性二次调节器,并且来自单个系统轨迹的新数据示例逐渐可用。该算法使用状态反馈控制器的干扰反馈表示,并在线凸优化以及内存和延迟反馈。在假设系统是稳定或给出已知稳定控制器的假设下,我们表明我们的控制器对此感到遗憾,而对于部分嵌套的信息模式来说,带有Time Horizon $ t $的标度为$ \ sqrt {t} $。对于更一般的信息模式,即使已知系统模型,最佳控制器也未知。在这种情况下,对线性亚最佳控制器的遗憾会显示出我们的控制器的遗憾。我们使用数值实验来验证理论发现。
We propose an online learning algorithm that adaptively designs a decentralized linear quadratic regulator when the system model is unknown a priori and new data samples from a single system trajectory become progressively available. The algorithm uses a disturbance-feedback representation of state-feedback controllers coupled with online convex optimization with memory and delayed feedback. Under the assumption that the system is stable or given a known stabilizing controller, we show that our controller enjoys an expected regret that scales as $\sqrt{T}$ with the time horizon $T$ for the case of partially nested information pattern. For more general information patterns, the optimal controller is unknown even if the system model is known. In this case, the regret of our controller is shown with respect to a linear sub-optimal controller. We validate our theoretical findings using numerical experiments.