论文标题
元提升学习中的超级奖项
Hypernetworks in Meta-Reinforcement Learning
论文作者
论文摘要
由于样本效率低下,培训对现实机器人技术任务的增强学习(RL)代理通常仍然不切实际。多任务RL和META-RL旨在通过概括相关任务的分布来提高样本效率。但是,在实践中这样做很困难:在多任务RL中,最新方法的状态通常无法超越简单的解决方案,该解决方案只是单独学习每个任务。超级核武器是一个有前途的途径,因为它们复制了退化解决方案的单独策略,同时还允许跨任务进行概括,并且适用于Meta-RL。但是,监督学习的证据表明,超网络性能对初始化高度敏感。在本文中,我们1)表明超网络初始化也是元rl的关键因素,而天真的初始化的性能较差。 2)提出了一种新型的超网络初始化方案,该方案匹配或超过针对监督设置提出的最先进方法的性能,并且更简单,更一般; 3)使用此方法表明,超网络可以通过评估多个模拟机器人基准来改善元RL的性能。
Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.