论文标题
PI-QT-OPT:预测信息可以大规模改善多任务机器人增强学习
PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale
论文作者
论文摘要
预测信息,即过去和未来之间的相互信息,已被证明是一个有用的表示,学习辅助学习辅助损失,因为训练强化学习剂的能力对接下来发生的事情进行建模对于许多控制任务的成功至关重要。尽管现有的研究在很大程度上仅限于在模拟中对单任务设置进行培训专家,但在这项工作中,我们研究了对机器人代理的预测信息的建模及其对通用药剂的重要性,这些通用剂的重要性是经过培训,这些通用代理人经过培训,可以从大量数据中掌握大量各种技能。具体而言,我们介绍了预测信息QT-OPT(PI-QT-OPT),这是一种QT-OPT代理,并具有辅助损失,该代理学习了预测性信息的表示,以求解多达297个基于视觉的机器人操作任务,以模拟和现实世界中具有一组参数。我们证明,建模预测信息可显着提高训练任务的成功率,并导致更好的零射击转移到看不见的新任务。最后,我们在实际机器人上评估了PI-QT-OPT,在不同环境,技能和多任务配置的多个实验设置中,对QT-OPT进行了实质和一致的改进。
The predictive information, the mutual information between the past and future, has been shown to be a useful representation learning auxiliary loss for training reinforcement learning agents, as the ability to model what will happen next is critical to success on many control tasks. While existing studies are largely restricted to training specialist agents on single-task settings in simulation, in this work, we study modeling the predictive information for robotic agents and its importance for general-purpose agents that are trained to master a large repertoire of diverse skills from large amounts of data. Specifically, we introduce Predictive Information QT-Opt (PI-QT-Opt), a QT-Opt agent augmented with an auxiliary loss that learns representations of the predictive information to solve up to 297 vision-based robot manipulation tasks in simulation and the real world with a single set of parameters. We demonstrate that modeling the predictive information significantly improves success rates on the training tasks and leads to better zero-shot transfer to unseen novel tasks. Finally, we evaluate PI-QT-Opt on real robots, achieving substantial and consistent improvement over QT-Opt in multiple experimental settings of varying environments, skills, and multi-task configurations.