通过贝叶斯网络和深度强化学习，对具有概率依赖性的恶化系统的推理和动态决策

论文标题

通过贝叶斯网络和深度强化学习，对具有概率依赖性的恶化系统的推理和动态决策

Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning

论文作者

Morato, Pablo G., Andriotis, Charalampos P., Papakonstantinou, Konstantinos G., Rigo, Philippe

论文摘要

在现代环境和社会问题的背景下，人们对能够识别土木工程系统的管理策略的方法的需求越来越大，最大程度地降低了结构性故障风险，同时最佳计划检查和维护（I＆M）流程。由于与联合系统级状态描述下的全局优化方法相关的计算复杂性，大多数可用方法将I＆M决策问题简化为组件级别。在本文中，我们提出了一个有效的算法框架，用于针对暴露于恶化环境的工程系统下的不确定性和决策制定，从而直接在系统级别提供最佳的管理策略。在我们的方法中，决策问题被提出为部分可观察到的马尔可夫决策过程，其动态是在贝叶斯网络条件结构中编码的。该方法可以通过高斯层次结构和动态贝叶斯网络在相等或一般的，不平等的恶化相关性下处理环境。在政策优化方面，我们采用了深层分散的多代理参与者 - 批评（DDMAC）强化学习方法，其中政策由批评家网络指导的参与者神经网络近似。通过在模拟环境中包括劣化依赖性，并通过在系统级别制定成本模型，DDMAC策略本质上考虑了基本系统效应。这是通过对疲劳恶化下的9分和钢架进行的数值实验证明的。结果表明，与最先进的启发式方法相比，DDMAC政策具有可观的好处。 DDMAC策略对系统效应的固有考虑也可以根据学习的政策来解释。

In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题