论文标题
K-Spin Hamiltonian用于量子反应的马尔可夫决策过程
K-spin Hamiltonian for quantum-resolvable Markov decision processes
论文作者
论文摘要
马尔可夫决策过程是当过渡和奖励功能未知时,现代增强学习领域的数学形式化。我们得出了一个伪树状的成本函数,相当于与无限范围内离散,有限,折扣的马尔可夫决策过程的K-Spin Hamiltonian表示。这款K-Spin Hamiltonian提供了使用启发式量子算法(例如绝热量子退火)和近期量子硬件上的量子近似优化算法来解决最佳策略的起点。在证明我们的哈密顿量的变异最小化等同于贝尔曼最佳条件,我们与经典田地理论建立了有趣的类比。除了概念验证的计算以通过模拟和量子退火来证实我们的配方,我们分析了在量子硬件上解决我们的汉密尔顿所需的物理资源规模。
The Markov decision process is the mathematical formalization underlying the modern field of reinforcement learning when transition and reward functions are unknown. We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process with infinite horizon. This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using heuristic quantum algorithms such as adiabatic quantum annealing and the quantum approximate optimization algorithm on near-term quantum hardware. In proving that the variational minimization of our Hamiltonian is equivalent to the Bellman optimality condition we establish an interesting analogy with classical field theory. Along with proof-of-concept calculations to corroborate our formulation by simulated and quantum annealing against classical Q-Learning, we analyze the scaling of physical resources required to solve our Hamiltonian on quantum hardware.