用于系统识别和最佳控制的混合学习方法

论文标题

用于系统识别和最佳控制的混合学习方法

A hybrid learning method for system identification and optimal control

论文作者

Schubnel, Baptiste, Carrillo, Rafael E., Alet, Pierre-Jean, Hutter, Andreas

论文摘要

我们提出了一种三步方法，以执行对非线性系统的系统识别和最佳控制。我们的方法主要是数据驱动的，并且不需要对系统进行积极的激发即可执行系统识别。特别是，它是为仅在闭环控制下的历史数据以及历史控制命令表现出较低的可变性的系统而设计的。第一步，在各种条件下构建和运行了系统的简单仿真模型。在第二步中，对神经网络体系结构进行了广泛的培训，以学习系统物理学，并通过使用“停止规则”的真实系统中的历史数据对其进行重新训练。这些约束避免了通过拟合闭环控制系统而产生的过度拟合。通过这样做，我们获得了一个（或多个）系统模型，以此体系结构为代表，并且可以选择其行为或多或少匹配真实系统。最后，使用域随机化和分布式学习变体的最新增强学习用于最佳控制系统。我们首先用一个简单的示例说明模型识别策略，即带有外部扭矩的摆。然后，我们应用我们的方法来建模并优化瑞士大型建筑设施的控制。仿真结果表明，这种方法会产生稳定的功能控制器，在基于舒适性和基准规则的控制器上表现出色。

We present a three-step method to perform system identification and optimal control of non-linear systems. Our approach is mainly data driven and does not require active excitation of the system to perform system identification. In particular, it is designed for systems for which only historical data under closed-loop control are available and where historical control commands exhibit low variability. In a first step, simple simulation models of the system are built and run under various conditions. In a second step, a neural network architecture is extensively trained on the simulation outputs to learn the system physics, and retrained with historical data from the real system with stopping rules. These constraints avoid overfitting that arise by fitting closed-loop controlled systems. By doing so, we obtain one (or many) system model(s), represented by this architecture, and whose behaviour can be chosen to match more or less the real system. Finally, state-of-the-art reinforcement learning with a variant of domain randomization and distributed learning is used for optimal control of the system. We first illustrate the model identification strategy with a simple example, the pendulum with external torque. We then apply our method to model and optimize the control of a large building facility located in Switzerland. Simulation results demonstrate that this approach generates stable functional controllers which outperform on comfort and energy benchmark rule-based controllers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题