论文标题
从观察中增强行为克隆
Augmented Behavioral Cloning from Observation
论文作者
论文摘要
来自观察的模仿是一种计算技术,它仅通过观察专家示范中的状态顺序来教导如何模仿专家的行为。最近的方法通过在更改演示数据的同时使两个模型的时代交织在一起,从而了解环境的反向动态和模仿政策。但是,这种方法通常会陷入与专家相距遥远的次级最佳解决方案中,从而限制了它们的模仿效率。我们通过一种新颖的方法来解决这个问题,该方法克服了通过探索来达到不良本地最小值的问题:(i)一种自我发挥的机制,可以更好地捕捉国家的全球特征; (ii)一种调节用于学习的观察结果的抽样策略。我们从经验上表明,我们的方法的表现超过了四种不同环境中最先进的方法。
Imitation from observation is a computational technique that teaches an agent on how to mimic the behavior of an expert by observing only the sequence of states from the expert demonstrations. Recent approaches learn the inverse dynamics of the environment and an imitation policy by interleaving epochs of both models while changing the demonstration data. However, such approaches often get stuck into sub-optimal solutions that are distant from the expert, limiting their imitation effectiveness. We address this problem with a novel approach that overcomes the problem of reaching bad local minima by exploring: (I) a self-attention mechanism that better captures global features of the states; and (ii) a sampling strategy that regulates the observations that are used for learning. We show empirically that our approach outperforms the state-of-the-art approaches in four different environments by a large margin.