Openai体育馆中未进入的量子加固学习剂

论文标题

Openai体育馆中未进入的量子加固学习剂

Unentangled quantum reinforcement learning agents in the OpenAI Gym

论文作者

Hsiao, Jen-Yueh, Du, Yuxuan, Chiang, Wei-Yin, Hsieh, Min-Hsiu, Goan, Hsi-Sheng

论文摘要

经典的增强学习（RL）在不同地区产生了出色的结果。但是，其样本效率仍然是一个关键问题。在本文中，我们提供了具体的数值证据，即量子RL的样本效率（收敛速度）可能比经典RL的样本效率更好，并且为了达到可比的学习性能，量子RL可以使用大量（至少一个数量级）的可训练参数，而不是经典RL。具体而言，我们在OpenAI体育馆采用了RL的流行基准测试环境，并表明我们的量子RL代理在Cartpole和Acrobot的任务中，在相同的优化过程中，我们的量子RL代理比经典的完全连接的神经网络（FCN）更快。我们还成功培训了可以在OpenAI体育馆完成Lunarlander任务的第一个量子RL代理。我们的量子RL试剂仅需要一个基于单位的变分量子电路而不会纠缠门，然后是经典的神经网络（NN）来后处理测量输出。最后，我们可以在真实的IBM量子机上完成上述任务。据我们所知，早期的量子RL代理都无法做到这一点。

Classical reinforcement learning (RL) has generated excellent results in different regions; however, its sample inefficiency remains a critical issue. In this paper, we provide concrete numerical evidence that the sample efficiency (the speed of convergence) of quantum RL could be better than that of classical RL, and for achieving comparable learning performance, quantum RL could use much (at least one order of magnitude) fewer trainable parameters than classical RL. Specifically, we employ the popular benchmarking environments of RL in the OpenAI Gym, and show that our quantum RL agent converges faster than classical fully-connected neural networks (FCNs) in the tasks of CartPole and Acrobot under the same optimization process. We also successfully train the first quantum RL agent that can complete the task of LunarLander in the OpenAI Gym. Our quantum RL agent only requires a single-qubit-based variational quantum circuit without entangling gates, followed by a classical neural network (NN) to post-process the measurement output. Finally, we could accomplish the aforementioned tasks on the real IBM quantum machines. To the best of our knowledge, none of the earlier quantum RL agents could do that.

下载PDF全文

下载文献需遵守相关版权规定

论文标题