论文标题

Lipschitzness是您需要驯服非政策的抗对抗模仿学习所需的全部

Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning

论文作者

Blondé, Lionel, Strasser, Pablo, Kalousis, Alexandros

论文摘要

尽管最近在各个领域进行了强化学习取得了成功,但这些方法在大多数情况下仍然对超级参数敏感,并且经常以基本的工程壮举赋予其成功。我们考虑了非政策生成的对抗性模仿学习的案例,并对该方法进行了深入的审查,定性和定量。我们表明,迫使学习的奖励功能成为本地Lipschitz-contiul,这是该方法表现良好的正当条件。然后,我们研究这种必要条件的效果,并提供了一些理论上的结果,涉及州值函数的局部Lipschitzness。我们通过证据证明了这些保证,证明了强烈的积极作用,即Lipschitzness限制对奖励对模仿表现的持续满足。最后,我们解决了一个通用的悲观奖励预处理附加组件,这些附加组件产卵了大量的奖励成型方法,这使得基本方法被插入更强大的基础方法,如多种理论保证所示。然后,我们通过细粒度的镜头讨论这些内容,并分享我们的见解。至关重要的是,这项工作中得出和报告的保证对于满足Lipschitzness条件的任何奖励都有效,没有任何特定的模仿。因此,这些可能具有独立的利益。

Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance. Finally, we tackle a generic pessimistic reward preconditioning add-on spawning a large class of reward shaping methods, which makes the base method it is plugged into provably more robust, as shown in several additional theoretical guarantees. We then discuss these through a fine-grained lens and share our insights. Crucially, the guarantees derived and reported in this work are valid for any reward satisfying the Lipschitzness condition, nothing is specific to imitation. As such, these may be of independent interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源