寺庙：用于样品有效多任务RL的过渡的学习模板

论文标题

寺庙：用于样品有效多任务RL的过渡的学习模板

TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL

论文作者

Sun, Yanchao, Yin, Xiangyu, Huang, Furong

论文摘要

在各种环境之间转移知识对于有效地在线学习多个任务很重要。大多数现有方法直接使用先前学习的模型或以前学到的最佳政策来学习新任务。但是，当基础模型或最佳策略在各个任务之间大不相同时，这些方法可能会降低。在本文中，我们提出了模板学习（Temple），这是第一种用于多任务加固学习的PAC-MDP方法，可以应用于具有不同状态/行动空间的任务。 Temple生成过渡动力学模板，跨任务的过渡动力学的抽象，即使任务之间的基础模型或最佳策略的共同点有限，也可以通过在任务之间提取相似性来提高样本效率。我们分别为“在线”和“有限模型”设置提供了两种算法。我们证明，我们提出的寺庙算法的样本复杂性要比单任务学习者或最先进的多任务方法要低得多。我们通过系统设计的实验表明，我们的寺庙方法普遍优于各种环境和制度中最先进的多任务方法（PAC-MDP）。

Transferring knowledge among various environments is important to efficiently learn multiple tasks online. Most existing methods directly use the previously learned models or previously learned optimal policies to learn new tasks. However, these methods may be inefficient when the underlying models or optimal policies are substantially different across tasks. In this paper, we propose Template Learning (TempLe), the first PAC-MDP method for multi-task reinforcement learning that could be applied to tasks with varying state/action space. TempLe generates transition dynamics templates, abstractions of the transition dynamics across tasks, to gain sample efficiency by extracting similarities between tasks even when their underlying models or optimal policies have limited commonalities. We present two algorithms for an "online" and a "finite-model" setting respectively. We prove that our proposed TempLe algorithms achieve much lower sample complexity than single-task learners or state-of-the-art multi-task methods. We show via systematically designed experiments that our TempLe method universally outperforms the state-of-the-art multi-task methods (PAC-MDP or not) in various settings and regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题