论文标题
没有人类示威的障碍塔:深馈网络与加强学习有关
Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward Network Goes with Reinforcement Learning
论文作者
论文摘要
障碍塔挑战是掌握一条程序生成的级别链的任务,随后很难完成。尽管去年比赛中表现最高的最高表现使用了人类的示范或奖励成型来学习如何应对挑战,但我们提出了一种竞争性(排名第七)的方法,但通过相对简单的进料向前进的深层网络结构,通过深入的强化学习完全从头开始。我们尤其要研究有关不同种子和各种视觉主题的采取方法的概括性能,这些主题已在比赛结束后可用,并研究了代理商在哪里失败以及原因。请注意,我们的方法不具有短期记忆,例如采用经常性隐藏状态。通过这项工作,我们希望通过一个相对简单,灵活的解决方案更好地理解可能的方法,该解决方案可以应用于具有复杂3D视觉输入的环境中,其中抽象任务结构本身仍然相当简单。
The Obstacle Tower Challenge is the task to master a procedurally generated chain of levels that subsequently get harder to complete. Whereas the most top performing entries of last year's competition used human demonstrations or reward shaping to learn how to cope with the challenge, we present an approach that performed competitively (placed 7th) but starts completely from scratch by means of Deep Reinforcement Learning with a relatively simple feed-forward deep network structure. We especially look at the generalization performance of the taken approach concerning different seeds and various visual themes that have become available after the competition, and investigate where the agent fails and why. Note that our approach does not possess a short-term memory like employing recurrent hidden states. With this work, we hope to contribute to a better understanding of what is possible with a relatively simple, flexible solution that can be applied to learning in environments featuring complex 3D visual input where the abstract task structure itself is still fairly simple.