通过从Minecraft中的示范中学习的样品有效的加强学习

论文标题

通过从Minecraft中的示范中学习的样品有效的加强学习

Sample Efficient Reinforcement Learning through Learning from Demonstrations in Minecraft

论文作者

Scheller, Christian, Schraner, Yanick, Vogel, Manfred

论文摘要

深度加固学习方法的效率低下是它们在现实世界应用中使用的主要障碍。在这项工作中，我们展示了人类示范如何在仅800万框架环境相互作用的情况下改善Minecraft Minigame上的代理的最终性能。我们提出了一个培训程序，首先对人类数据进行培训，然后通过强化学习进行微调。使用政策剥削机制，经验重播和对灾难性遗忘的额外损失，我们的最佳代理人能够达到48的平均得分。我们提出的解决方案在神经矿工竞争中排名第三，以进行样品评估的增强学习。

Sample inefficiency of deep reinforcement learning methods is a major obstacle for their use in real-world applications. In this work, we show how human demonstrations can improve final performance of agents on the Minecraft minigame ObtainDiamond with only 8M frames of environment interaction. We propose a training procedure where policy networks are first trained on human data and later fine-tuned by reinforcement learning. Using a policy exploitation mechanism, experience replay and an additional loss against catastrophic forgetting, our best agent was able to achieve a mean score of 48. Our proposed solution placed 3rd in the NeurIPS MineRL Competition for Sample-Efficient Reinforcement Learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题