论文标题
通过生成模型的内在奖励驱动模仿学习
Intrinsic Reward Driven Imitation Learning via Generative Model
论文作者
论文摘要
在高维环境中的模仿学习具有挑战性。在这样的高维环境中,例如Atari域,大多数反向加强学习(IRL)方法无法超越演示者。为了应对这一挑战,我们提出了一个新颖的奖励学习模块,以通过生成模型生成内在的奖励信号。我们的生成方法可以执行更好的远期状态过渡和向后操作编码,从而提高了模块在环境中的动态建模能力。因此,我们的模块提供了模仿剂既是示威者的内在意图,又提供了更好的勘探能力,这对于代理商胜过演示者至关重要。经验结果表明,即使有单人生的演示,我们的方法在多个Atari游戏上都优于最先进的IRL方法。值得注意的是,我们的方法实现了演示表现的5倍的性能。
Imitation learning in a high-dimensional environment is challenging. Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in such a high-dimensional environment, e.g., Atari domain. To address this challenge, we propose a novel reward learning module to generate intrinsic reward signals via a generative model. Our generative method can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment. Thus, our module provides the imitation agent both the intrinsic intention of the demonstrator and a better exploration ability, which is critical for the agent to outperform the demonstrator. Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration. Remarkably, our method achieves performance that is up to 5 times the performance of the demonstration.