论文标题
深层变压器Q-NETWORKS,用于部分可观察的增强学习
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
论文作者
论文摘要
现实世界的强化学习任务通常涉及某种形式的部分可观察性,即观察结果只给出了对世界真实状态的部分或嘈杂的看法。这样的任务通常需要某种形式的内存,其中代理可以访问过去的多个观测值,以表现良好。合并内存的一种流行方法是使用经常性神经网络访问代理的历史记录。但是,增强学习中的复发性神经网络通常很脆弱且难以训练,因此容易受到灾难性遗忘的影响,因此有时会完全失败。在这项工作中,我们提出了深层变压器Q-Networks(DTQN),这是一种利用变压器和自我注意的新颖体系结构,以编码代理的历史。 DTQN的设计是模块化的,我们将结果与基本模型进行了比较。我们的实验表明,与以前的复发方法相比,变压器可以更快,更稳定地解决部分可观察到的任务。
Real-world reinforcement learning tasks often involve some form of partial observability where the observations only give a partial or noisy view of the true state of the world. Such tasks typically require some form of memory, where the agent has access to multiple past observations, in order to perform well. One popular way to incorporate memory is by using a recurrent neural network to access the agent's history. However, recurrent neural networks in reinforcement learning are often fragile and difficult to train, susceptible to catastrophic forgetting and sometimes fail completely as a result. In this work, we propose Deep Transformer Q-Networks (DTQN), a novel architecture utilizing transformers and self-attention to encode an agent's history. DTQN is designed modularly, and we compare results against several modifications to our base model. Our experiments demonstrate the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.