受控的在线优化学习（COOL）：通过增强学习找到旋转汉密尔顿人的基态

论文标题

受控的在线优化学习（COOL）：通过增强学习找到旋转汉密尔顿人的基态

Controlled Online Optimization Learning (COOL): Finding the ground state of spin Hamiltonians with reinforcement learning

论文作者

Mills, Kyle, Ronagh, Pooya, Tamblyn, Isaac

论文摘要

增强学习（RL）已成为一种验证的方法，用于优化已定义成功的过程，但是实现成功所需的具体措施。我们将所谓的RL的“黑匣子”方法应用于所谓的模拟退火（SA）的“黑色艺术”（SA），表明基于近端政策优化的RL代理可以通过仅经验而达到的温度时间表超过了两类汉密尔顿人的标准启发式温度时间表。当系统在凉爽的温度下初始化时，RL试剂学会将系统加热以“熔化”它，然后慢慢冷却，以将其退火到基态。如果系统在高温下初始化，则该算法会立即冷却系统。我们调查了我们的RL驱动的SA代理商的表现，以推广到特定阶级的所有哈密顿人；当对最近邻居自旋眼镜的随机汉密尔顿人进行训练时，RL代理能够控制其他汉密尔顿人的SA过程，比简单的线性退火时间表以更高的概率达到基态。此外，RL方法的缩放性能（相对于系统大小）更加有利，可以在L = 14x14系统上提高一个数量级的性能。当系统以“破坏性观察”模式运行时，我们证明了RL方法的鲁棒性，这是对测量结果破坏系统状态的量子系统的典故。 RL代理的成功可能会产生深远的影响，从经典优化到量子退火到物理系统的模拟。

Reinforcement learning (RL) has become a proven method for optimizing a procedure for which success has been defined, but the specific actions needed to achieve it have not. We apply the so-called "black box" method of RL to what has been referred as the "black art" of simulated annealing (SA), demonstrating that an RL agent based on proximal policy optimization can, through experience alone, arrive at a temperature schedule that surpasses the performance of standard heuristic temperature schedules for two classes of Hamiltonians. When the system is initialized at a cool temperature, the RL agent learns to heat the system to "melt" it, and then slowly cool it in an effort to anneal to the ground state; if the system is initialized at a high temperature, the algorithm immediately cools the system. We investigate the performance of our RL-driven SA agent in generalizing to all Hamiltonians of a specific class; when trained on random Hamiltonians of nearest-neighbour spin glasses, the RL agent is able to control the SA process for other Hamiltonians, reaching the ground state with a higher probability than a simple linear annealing schedule. Furthermore, the scaling performance (with respect to system size) of the RL approach is far more favourable, achieving a performance improvement of one order of magnitude on L=14x14 systems. We demonstrate the robustness of the RL approach when the system operates in a "destructive observation" mode, an allusion to a quantum system where measurements destroy the state of the system. The success of the RL agent could have far-reaching impact, from classical optimization, to quantum annealing, to the simulation of physical systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题