持续的辅助任务学习

论文标题

持续的辅助任务学习

Continual Auxiliary Task Learning

论文作者

McLeod, Matthew, Lo, Chunlok, Schlegel, Matthew, Jacobsen, Andrew, Kumaraswamy, Raksha, White, Martha, White, Adam

论文摘要

学习辅助任务，例如对世界的多个预测，可以为增强学习系统提供许多好处。已经开发了各种非政策学习算法来学习此类预测，但是关于如何适应该行为以收集有用的数据来为这些非政策预测收集有用的数据。在这项工作中，我们调查了一种旨在学习辅助任务集合的强化学习系统，其行为政策学习采取行动来改善这些辅助预测。对于预测学习者和行为学习者，我们强调了这个连续的辅助任务学习问题中固有的非平稳性。我们基于后继功能开发了一种算法，该算法有助于在非平稳奖励下进行跟踪，并证明将分离为学习后继功能和奖励提供了融合率的提高。我们对由此产生的多预测学习系统进行了深入的研究。

Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions. We highlight the inherent non-stationarity in this continual auxiliary task learning problem, for both prediction learners and the behavior learner. We develop an algorithm based on successor features that facilitates tracking under non-stationary rewards, and prove the separation into learning successor features and rewards provides convergence rate improvements. We conduct an in-depth study into the resulting multi-prediction learning system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题