基于探索的基于文本游戏的语言学习

论文标题

基于探索的基于文本游戏的语言学习

Exploration Based Language Learning for Text-Based Games

论文作者

Madotto, Andrea, Namazifar, Mahdi, Huizinga, Joost, Molino, Piero, Ecoffet, Adrien, Zheng, Huaixiu, Papangelis, Alexandros, Yu, Dian, Khatri, Chandra, Tur, Gokhan

论文摘要

这项工作提出了一种基于探索和模仿学习的代理，能够在玩基于文本的计算机游戏中具有最先进的性能。基于文本的电脑游戏通过自然语言将自己的世界描述为玩家，并期望玩家使用文本与游戏互动。这些游戏引起了人们的关注，因为它们可以被视为人工代理人的语言理解，解决问题和语言的测试台。此外，它们提供了一种学习环境，可以通过与环境的互动而不是使用固定语料库来获得这些技能。使这些游戏特别具有挑战性的一个方面是组合较大的动作空间。现有的解决基于文本游戏的方法仅限于非常简单的游戏，要么具有一组预定的可接受动作集。在这项工作中，我们建议使用Go-explore的探索方法来解决基于文本的游戏。更具体地说，在初始探索阶段，我们首先提取具有高奖励的轨迹，之后我们通过模仿这些轨迹来训练一项政策来解决游戏。我们的实验表明，这种方法在求解基于文本的游戏方面的表现优于现有解决方案，并且在与环境的互动数量方面，它更有效。此外，我们表明，学到的政策可以比现有的解决方案更好地概括为看不见的游戏，而无需对动作空间进行任何限制。

This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, and language generation by artificial agents. Moreover, they provide a learning environment in which these skills can be acquired through interactions with an environment rather than using fixed corpora. One aspect that makes these games particularly challenging for learning agents is the combinatorially large action space. Existing methods for solving text-based games are limited to games that are either very simple or have an action space restricted to a predetermined set of admissible actions. In this work, we propose to use the exploration approach of Go-Explore for solving text-based games. More specifically, in an initial exploration phase, we first extract trajectories with high rewards, after which we train a policy to solve the game by imitating these trajectories. Our experiments show that this approach outperforms existing solutions in solving text-based games, and it is more sample efficient in terms of the number of interactions with the environment. Moreover, we show that the learned policy can generalize better than existing solutions to unseen games without using any restriction on the action space.

下载PDF全文

下载文献需遵守相关版权规定

论文标题