论文标题
通过主动学习对材料特性曲线和表面的有效估算
Efficient Estimation of Material Property Curves and Surfaces via Active Learning
论文作者
论文摘要
材料特性与自变量(例如温度,外场或时间)之间的关系通常由多维空间中的曲线或表面表示。确定这样的曲线或表面需要一系列实验或计算,这些实验或计算通常是时间和成本的。一般策略使用适当的效用功能来采样空间,以推荐主动学习循环中的下一个最佳实验或计算。但是,知道最大程度地减少实验数量的最佳抽样策略是一个悬而未决的问题。我们根据基于Kriging的模型对几种材料问题的多种材料问题进行了定向探索,比较了许多策略。其中包括一个维曲线,例如304L不锈钢的疲劳寿命曲线和Fe-C相图的液体线,3D空间中的Hartmann 3功能等表面以及AR-SH的拟合间分子潜力,以及基于Batio3的Ceramics的四二维实验测量值。我们还考虑了实验噪声对Hartmann 3功能的影响。我们发现,以最大差异为指导的定向探索可以使整体性能更好,在几个数据集中融合更快。但是,对于某些问题,纳入剥削的权衡方法至少也可以执行,即使不超过最大差异。因此,我们讨论了效用函数的选择如何取决于数据的分布,模型性能和不确定性,添加剂噪声以及预算。
The relationship between material properties and independent variables such as temperature, external field or time, is usually represented by a curve or surface in a multi-dimensional space. Determining such a curve or surface requires a series of experiments or calculations which are often time and cost consuming. A general strategy uses an appropriate utility function to sample the space to recommend the next optimal experiment or calculation within an active learning loop. However, knowing what the optimal sampling strategy to use to minimize the number of experiments is an outstanding problem. We compare a number of strategies based on directed exploration on several materials problems of varying complexity using a Kriging based model. These include one dimensional curves such as the fatigue life curve for 304L stainless steel and the Liquidus line of the Fe-C phase diagram, surfaces such as the Hartmann 3 function in 3D space and the fitted intermolecular potential for Ar-SH, and a four dimensional data set of experimental measurements for BaTiO3 based ceramics. We also consider the effects of experimental noise on the Hartmann 3 function. We find that directed exploration guided by maximum variance provides better performance overall, converging faster across several data sets. However, for certain problems, the trade-off methods incorporating exploitation can perform at least as well, if not better than maximum variance. Thus, we discuss how the choice of the utility function depends on the distribution of the data, the model performance and uncertainties, additive noise as well as the budget.