论文标题
观看和匹配:通过正规化的最佳运输增压模仿
Watch and Match: Supercharging Imitation with Regularized Optimal Transport
论文作者
论文摘要
模仿学习在有效地学习政策方面对复杂的决策问题有着巨大的希望。当前的最新算法通常使用逆增强学习(IRL),在给定一组专家演示的情况下,代理会替代奖励功能和相关的最佳策略。但是,这种IRL方法通常需要在复杂控制问题上进行实质性的在线互动。在这项工作中,我们提出了正规化的最佳运输(ROT),这是一种新的模仿学习算法,基于最佳基于最佳运输轨迹匹配的最新进展。我们的主要技术见解是,即使只有少数示威,将轨迹匹配的奖励与行为克隆结合起来也可以显着加速模仿。与先前的最新方法相比,我们对跨DeepMind Control Suite,OpenAI机器人套件和Meta-World基准进行的20个视觉控制任务的实验表明,平均达到7倍的模仿速度,达到90%的专家性能。在现实世界的机器人操作中,只有一次演示和一个小时的在线培训,ROT在14个任务中的平均成功率为90.1%。
Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.