无限制的无模型强化学习以进行过程优化

论文标题

无限制的无模型强化学习以进行过程优化

Constrained Model-Free Reinforcement Learning for Process Optimization

论文作者

Pan, Elton, Petsagkourakis, Panagiotis, Mowbray, Max, Zhang, Dongda, del Rio-Chanona, Antonio

论文摘要

增强学习（RL）是一种控制方法，可以处理非线性随机最佳控制问题。然而，尽管有承诺，RL尚未看到对工业实践的明显翻译，主要是因为它无法满足国家约束。在这项工作中，我们旨在应对这一挑战。我们提出了一种“ Oracle”辅助限制的Q学习算法，该算法保证了具有很高可能性的联合机会限制的满意度，这对于安全关键任务至关重要。为了实现这一目标，使用Broyden的方法引入和调整了约束收紧（退缩），从而使它们进行了自我调整。这导致了一种通用方法，可以将其浸入近似动态编程的算法中，以确保对高概率的限制满意度。最后，我们提出了分析提出方法的性能并将该算法与模型预测控制（MPC）进行比较的案例研究。该算法的有利性能表示将RL纳入现实世界的优化和对工程系统的控制迈出的一步，在这种系统中，在确保安全至关重要的情况下，约束至关重要。

Reinforcement learning (RL) is a control approach that can handle nonlinear stochastic optimal control problems. However, despite the promise exhibited, RL has yet to see marked translation to industrial practice primarily due to its inability to satisfy state constraints. In this work we aim to address this challenge. We propose an 'oracle'-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with a high probability, which is crucial for safety critical tasks. To achieve this, constraint tightening (backoffs) are introduced and adjusted using Broyden's method, hence making them self-tuned. This results in a general methodology that can be imbued into approximate dynamic programming-based algorithms to ensure constraint satisfaction with high probability. Finally, we present case studies that analyze the performance of the proposed approach and compare this algorithm with model predictive control (MPC). The favorable performance of this algorithm signifies a step toward the incorporation of RL into real world optimization and control of engineering systems, where constraints are essential in ensuring safety.

下载PDF全文

下载文献需遵守相关版权规定

论文标题