论文标题
支持加权对抗性模仿学习
Support-weighted Adversarial Imitation Learning
论文作者
论文摘要
对抗性模仿学习(AIL)是一种广泛的模仿学习方法,旨在模仿示范中的专家行为。尽管AIL仅使用少量的示威表现出了模仿学习的最先进表现,但它面临着一些实际的挑战,例如潜在的训练不稳定和隐性奖励偏见。为了应对挑战,我们提出了支持加权的对抗性模仿学习(SAIL),这是一个通用框架,该框架扩展了给定的AIL算法,并从专家政策的支持估计中得出了信息。 SAIL通过权衡对抗性奖励并获得对专家政策的支持估计来提高增强信号的质量。我们还表明,帆至少与Sail用来学习对抗奖励的基础AIL算法一样高。从经验上讲,我们表明所提出的方法比在广泛的基准控制任务上的基线方法实现了更好的性能和训练稳定性。
Adversarial Imitation Learning (AIL) is a broad family of imitation learning methods designed to mimic expert behaviors from demonstrations. While AIL has shown state-of-the-art performance on imitation learning with only small number of demonstrations, it faces several practical challenges such as potential training instability and implicit reward bias. To address the challenges, we propose Support-weighted Adversarial Imitation Learning (SAIL), a general framework that extends a given AIL algorithm with information derived from support estimation of the expert policies. SAIL improves the quality of the reinforcement signals by weighing the adversarial reward with a confidence score from support estimation of the expert policy. We also show that SAIL is always at least as efficient as the underlying AIL algorithm that SAIL uses for learning the adversarial reward. Empirically, we show that the proposed method achieves better performance and training stability than baseline methods on a wide range of benchmark control tasks.