使用深钢筋学习的计算能节能热处理用于微观结构设计

论文标题

使用深钢筋学习的计算能节能热处理用于微观结构设计

Computational Discovery of Energy-Efficient Heat Treatment for Microstructure Design using Deep Reinforcement Learning

论文作者

Mianroodi, Jaber R., Siboni, Nima H., Raabe, Dierk

论文摘要

深入增强学习（DRL）被用来开发自主优化和定制设计的热处理过程，这些过程既对微结构敏感又节能。与传统的监督机器学习不同，DRL并不依赖于仅从数据中的静态神经网络培训，但是学习代理人会根据奖励和惩罚元素自主制定最佳解决方案，并减少或没有监督。在我们的方法中，使用温度依赖的Allen-CAHN模型作为DRL代理的环境，是其获得经验并采取自主决策的模型世界。 DRL算法的代理正在控制系统的温度，作为用于合金热处理的模型炉。基于所需的相的微观结构为代理定义了微观结构目标。训练后，代理可以生成各种初始微观结构状态的温度时间曲线，以达到最终所需的微观结构状态。详细研究了代理商的性能和热处理概况的物理含义。特别是，该试剂能够控制温度以从各种初始条件开始到达所需的微观结构。代理在处理各种条件方面的这种能力为使用这种方法铺平了道路，这也用于回收导向的热处理过程设计，由于杂质侵入，初始组合物可能因批量而变化，以及用于节能热处理的设计。为了检验这一假设，将无罚款的代理人与考虑能源成本的代理人进行了比较。能源成本罚款被作为用于查找最佳温度时间轮廓的代理的附加标准。

Deep Reinforcement Learning (DRL) is employed to develop autonomously optimized and custom-designed heat-treatment processes that are both, microstructure-sensitive and energy efficient. Different from conventional supervised machine learning, DRL does not rely on static neural network training from data alone, but a learning agent autonomously develops optimal solutions, based on reward and penalty elements, with reduced or no supervision. In our approach, a temperature-dependent Allen-Cahn model for phase transformation is used as the environment for the DRL agent, serving as the model world in which it gains experience and takes autonomous decisions. The agent of the DRL algorithm is controlling the temperature of the system, as a model furnace for heat-treatment of alloys. Microstructure goals are defined for the agent based on the desired microstructure of the phases. After training, the agent can generate temperature-time profiles for a variety of initial microstructure states to reach the final desired microstructure state. The agent's performance and the physical meaning of the heat-treatment profiles generated are investigated in detail. In particular, the agent is capable of controlling the temperature to reach the desired microstructure starting from a variety of initial conditions. This capability of the agent in handling a variety of conditions paves the way for using such an approach also for recycling-oriented heat treatment process design where the initial composition can vary from batch to batch, due to impurity intrusion, and also for the design of energy-efficient heat treatments. For testing this hypothesis, an agent without penalty on the total consumed energy is compared with one that considers energy costs. The energy cost penalty is imposed as an additional criterion on the agent for finding the optimal temperature-time profile.

下载PDF全文

下载文献需遵守相关版权规定

论文标题