在不确定性下进行动态优化的受限强化学习

论文标题

在不确定性下进行动态优化的受限强化学习

Constrained Reinforcement Learning for Dynamic Optimization under Uncertainty

论文作者

Petsagkourakis, Panagiotis, Sandoval, Ilya Orson, Bradford, Eric, Zhang, Dongda, Chanona, Ehecatl Antonio del Río

论文摘要

动态实时优化（DRTO）是一项具有挑战性的任务，因为必须实时计算最佳操作条件。 DRTO工业应用中的主要瓶颈是不确定性的存在。许多随机系统都存在以下障碍：1）植物模型不匹配，2）过程干扰，3）违反过程限制的风险。为了适应这些困难，我们提出了基于约束的加固学习（RL）方法。 RL通过计算最佳反馈策略来自然处理过程不确定性。但是，没有任何状态限制可以直观地引入。为了解决这个问题，我们提出了一种偶然受限的RL方法。我们使用机会限制来确保过程约束的概率满意度，这是通过引入退缩来完成的，以便同时计算最佳策略和退缩。使用经验累积分布函数调整退缩，以确保关节机会限制的满意度。通过随机动态生物过程优化问题来说明该策略的优势和性能，以产生可持续的高价值生物产品。

Dynamic real-time optimization (DRTO) is a challenging task due to the fact that optimal operating conditions must be computed in real time. The main bottleneck in the industrial application of DRTO is the presence of uncertainty. Many stochastic systems present the following obstacles: 1) plant-model mismatch, 2) process disturbances, 3) risks in violation of process constraints. To accommodate these difficulties, we present a constrained reinforcement learning (RL) based approach. RL naturally handles the process uncertainty by computing an optimal feedback policy. However, no state constraints can be introduced intuitively. To address this problem, we present a chance-constrained RL methodology. We use chance constraints to guarantee the probabilistic satisfaction of process constraints, which is accomplished by introducing backoffs, such that the optimal policy and backoffs are computed simultaneously. Backoffs are adjusted using the empirical cumulative distribution function to guarantee the satisfaction of a joint chance constraint. The advantage and performance of this strategy are illustrated through a stochastic dynamic bioprocess optimization problem, to produce sustainable high-value bioproducts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题