论文标题
评估3D迷宫中的长期记忆
Evaluating Long-Term Memory in 3D Mazes
论文作者
论文摘要
智能代理需要记住明显的信息才能在部分观察到的环境中进行推理。例如,具有第一人称视图的代理人也应该记住相关对象的位置,即使它们不了解。同样,要有效地浏览经纪人的房间,需要记住房间如何连接的平面图。但是,大多数强化学习中的基准任务不会测试代理中的长期记忆,从而减慢了这一重要的研究方向的进步。在本文中,我们介绍了记忆迷宫,这是一个专门设计用于评估代理长期记忆的随机迷宫的3D域。与现有基准不同,内存迷宫测量长期内存与混杂的代理能力分开,并要求代理通过随着时间的推移整合信息来本地化。使用内存迷宫,我们提出了一个在线增强学习基准,一个不同的离线数据集和离线探测评估。记录人类玩家建立了强大的基线,并验证了建立和保留记忆的必要性,这反映在每个情节中逐渐增加的奖励中。我们发现,当前的算法受益于通过截断的反向传播的培训,并在小迷宫上取得了成功,但是在大迷宫上的人类表现不足,为未来的算法设计留出了空间,可以在记忆迷宫上评估。
Intelligent agents need to remember salient information to reason in partially-observed environments. For example, agents with a first-person view should remember the positions of relevant objects even if they go out of view. Similarly, to effectively navigate through rooms agents need to remember the floor plan of how rooms are connected. However, most benchmark tasks in reinforcement learning do not test long-term memory in agents, slowing down progress in this important research direction. In this paper, we introduce the Memory Maze, a 3D domain of randomized mazes specifically designed for evaluating long-term memory in agents. Unlike existing benchmarks, Memory Maze measures long-term memory separate from confounding agent abilities and requires the agent to localize itself by integrating information over time. With Memory Maze, we propose an online reinforcement learning benchmark, a diverse offline dataset, and an offline probing evaluation. Recording a human player establishes a strong baseline and verifies the need to build up and retain memories, which is reflected in their gradually increasing rewards within each episode. We find that current algorithms benefit from training with truncated backpropagation through time and succeed on small mazes, but fall short of human performance on the large mazes, leaving room for future algorithmic designs to be evaluated on the Memory Maze.