自适应网状精炼的多代理增强学习

论文标题

自适应网状精炼的多代理增强学习

Multi-Agent Reinforcement Learning for Adaptive Mesh Refinement

论文作者

Yang, Jiachen, Mittal, Ketan, Dzanic, Tarik, Petrides, Socratis, Keith, Brendan, Petersen, Brenden, Faissol, Daniel, Anderson, Robert

论文摘要

自适应网状细化（AMR）对于对复杂物理现象的有限元模拟来说是必要的，因为它基于对更高或更低分辨率的需求分配有限的计算预算，这会随时间和时间而变化。我们提出了AMR作为完全合作的马尔可夫游戏的新颖表述，其中每个元素都是一个独立的代理商，可以根据本地信息做出改进和退化选择。我们设计了一种称为Value Decomposition Graph网络（VDGN）的新型深层强化学习（MARL）算法，该算法解决了AMR对MARL所构成的两个核心挑战：由于代理人的创建和删除，以及由于网状遗传学观察而引起的后信贷分配。我们首次表明，MAR可以对将在未来时期遇到复杂功能的区域进行预期的完善，从而解开了基于本地误差估计器的传统方法无法访问的错误成本目标景观的全新区域。综合实验表明，在全球误差和成本指标中，VDGN策略在基于错误阈值的策略上显着超过了。我们表明，学到的政策概括以测试具有物理特征，网状几何形状和较长模拟时间的问题，这些问题在训练中未见。我们还使用多目标优化功能扩展了VDGN，以在成本和错误之间找到权衡的帕累托前沿。

Adaptive mesh refinement (AMR) is necessary for efficient finite element simulations of complex physical phenomenon, as it allocates limited computational budget based on the need for higher or lower resolution, which varies over space and time. We present a novel formulation of AMR as a fully-cooperative Markov game, in which each element is an independent agent who makes refinement and de-refinement choices based on local information. We design a novel deep multi-agent reinforcement learning (MARL) algorithm called Value Decomposition Graph Network (VDGN), which solves the two core challenges that AMR poses for MARL: posthumous credit assignment due to agent creation and deletion, and unstructured observations due to the diversity of mesh geometries. For the first time, we show that MARL enables anticipatory refinement of regions that will encounter complex features at future times, thereby unlocking entirely new regions of the error-cost objective landscape that are inaccessible by traditional methods based on local error estimators. Comprehensive experiments show that VDGN policies significantly outperform error threshold-based policies in global error and cost metrics. We show that learned policies generalize to test problems with physical features, mesh geometries, and longer simulation times that were not seen in training. We also extend VDGN with multi-objective optimization capabilities to find the Pareto front of the tradeoff between cost and error.

下载PDF全文

下载文献需遵守相关版权规定

论文标题