基于模型的强化学习：一项调查

论文标题

基于模型的强化学习：一项调查

Model-based Reinforcement Learning: A Survey

论文作者

Moerland, Thomas M., Broekens, Joost, Plaat, Aske, Jonker, Catholijn M.

论文摘要

顺序决策通常被正式形式化为马尔可夫决策过程（MDP）优化，是人工智能中的重要挑战。解决此问题的两个关键方法是加强学习（RL）和计划。本文介绍了两个领域的整合的调查，这是基于模型的增强学习。基于模型的RL有两个主要步骤。首先，我们系统地涵盖了动力学模型学习的方法，包括处理随机性，不确定性，部分可观察性和时间抽象等挑战。其次，我们提出了计划学习整合的系统分类，包括以下方面：在哪里开始计划，将哪些预算分配给计划和真实数据收集，如何计划以及如何将计划整合到学习和行动循环中。在这两个部分之后，我们还讨论了基于隐式模型的RL作为模型学习和计划的端到端替代方案，并涵盖了基于模型的RL的潜在好处。一路上，调查还与几个相关的RL字段（例如层次RL和转移学习）建立了连接。总的来说，该调查对MDP优化的计划和学习结合了广泛的概念概述。

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题