通过政策迭代学习量子加强学习

论文标题

通过政策迭代学习量子加强学习

Quantum Reinforcement Learning via Policy Iteration

论文作者

Cherrat, El Amine, Kerenidis, Iordanis, Prakash, Anupam

论文摘要

量子计算已经显示出有可能加快机器学习应用程序的潜力，尤其是在监督和无监督的学习中。另一方面，强化学习对于解决许多决策问题和政策迭代方法一直是这种方法的基础。在本文中，我们提供了一个通用框架，用于通过策略迭代进行量子增强学习。我们通过设计和分析来验证我们的框架：\ emph {量子策略评估}通过构建大约编码策略$π$的价值函数的量子状态的无限水平折扣问题的方法；和\ emph {量子策略改进}方法，通过对这些量子状态的后处理测量结果进行后处理。最后，我们研究了量子算法在Openai的健身房的两个环境中的理论和实验性能。

Quantum computing has shown the potential to substantially speed up machine learning applications, in particular for supervised and unsupervised learning. Reinforcement learning, on the other hand, has become essential for solving many decision making problems and policy iteration methods remain the foundation of such approaches. In this paper, we provide a general framework for performing quantum reinforcement learning via policy iteration. We validate our framework by designing and analyzing: \emph{quantum policy evaluation} methods for infinite horizon discounted problems by building quantum states that approximately encode the value function of a policy $π$; and \emph{quantum policy improvement} methods by post-processing measurement outcomes on these quantum states. Last, we study the theoretical and experimental performance of our quantum algorithms on two environments from OpenAI's Gym.

下载PDF全文

下载文献需遵守相关版权规定

论文标题