量子策略迭代通过幅度估计和Grover搜索 - 提高增强学习的量子优势

论文标题

量子策略迭代通过幅度估计和Grover搜索 - 提高增强学习的量子优势

Quantum Policy Iteration via Amplitude Estimation and Grover Search -- Towards Quantum Advantage for Reinforcement Learning

论文作者

Wiedemann, Simon, Hein, Daniel, Udluft, Steffen, Mendl, Christian

论文摘要

我们提供了一种全新的量子增强学习方法的完整实施和模拟。我们的工作是量子算法如何用于解决强化学习问题的详细和正式概念证明，并表明，鉴于对代理和环境的无错误，有效的量子实现，量子方法可以根据样品复杂性来实现基于经典的蒙特卡洛方法的可证明的改进。我们的方法详细说明了如何将振幅估计和Grover搜索结合到政策评估和改进方案中。我们首先开发量子政策评估（QPE），与类似的经典蒙特卡洛估计相比，该量子在四边形上更有效，并且基于有限的马尔可夫决策过程（MDP）的量子机械实现。在QPE的基础上，我们得出了量子策略迭代，该量子反复使用Grover搜索改进初始策略，直到达到最佳。最后，我们为两臂强盗MDP提供了算法的实现，然后我们进行了模拟。

We present a full implementation and simulation of a novel quantum reinforcement learning method. Our work is a detailed and formal proof of concept for how quantum algorithms can be used to solve reinforcement learning problems and shows that, given access to error-free, efficient quantum realizations of the agent and environment, quantum methods can yield provable improvements over classical Monte-Carlo based methods in terms of sample complexity. Our approach shows in detail how to combine amplitude estimation and Grover search into a policy evaluation and improvement scheme. We first develop quantum policy evaluation (QPE) which is quadratically more efficient compared to an analogous classical Monte Carlo estimation and is based on a quantum mechanical realization of a finite Markov decision process (MDP). Building on QPE, we derive a quantum policy iteration that repeatedly improves an initial policy using Grover search until the optimum is reached. Finally, we present an implementation of our algorithm for a two-armed bandit MDP which we then simulate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题