元：元增强学习与赋权驱动的探索

论文标题

元：元增强学习与赋权驱动的探索

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

论文作者

Zhang, Jin, Wang, Jianhao, Hu, Hao, Chen, Tong, Chen, Yingfeng, Fan, Changjie, Zhang, Chongjie

论文摘要

元加强学习（META-RL）从先前的任务中提取知识，并快速适应新任务。尽管最近取得了进展，但在稀疏奖励任务中，Meta-RL的有效探索仍然是一个关键的挑战，因为它需要快速在元训练和适应中找到信息的与任务相关的经验。为了应对这一挑战，我们明确地为Meta-RL的探索政策学习问题建模，该探索政策学习问题与剥削政策学习分开，并引入了一种新颖的授权探索目标，该目标旨在最大程度地提高信息识别信息的信息。我们获得了相应的固有奖励，并开发了一个新的非政策元素元素框架，该框架通过共享任务推论的知识来有效地学习单独的上下文感知探索和剥削策略。实验评估表明，我们的Meta-RL方法在各种稀疏的Mujoco运动任务和更复杂的稀疏回报元世界任务上明显优于最先进的基线。

Meta reinforcement learning (meta-RL) extracts knowledge from previous tasks and achieves fast adaptation to new tasks. Despite recent progress, efficient exploration in meta-RL remains a key challenge in sparse-reward tasks, as it requires quickly finding informative task-relevant experiences in both meta-training and adaptation. To address this challenge, we explicitly model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning, and introduce a novel empowerment-driven exploration objective, which aims to maximize information gain for task identification. We derive a corresponding intrinsic reward and develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies by sharing the knowledge of task inference. Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on various sparse-reward MuJoCo locomotion tasks and more complex sparse-reward Meta-World tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题