关于离线增强学习的调查：分类法，审查和开放问题

论文标题

关于离线增强学习的调查：分类法，审查和开放问题

A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems

论文作者

Prudencio, Rafael Figueiredo, Maximo, Marcos R. O. A., Colombini, Esther Luna

论文摘要

随着深度学习的广泛采用，加强学习（RL）的流行度急剧增加，扩展到以前棘手的问题，例如从像素观察中玩复杂的游戏，与人类的对话以及控制机器人的对话。但是，由于与环境相互作用的高成本和危险，RL仍然存在广泛的域。离线RL是一种范式，它仅从先前收集的交互的静态数据集中学习，这使得从大型多样的培训数据集中提取策略是可行的。有效的离线RL算法比在线RL具有更大的应用程序，对现实世界中的应用程序（例如教育，医疗保健和机器人技术）特别有吸引力。在这项工作中，我们通过统一的分类法进行了贡献，以对离线RL方法进行分类。此外，我们使用统一符号以及对现有基准的属性和缺点的审查，对现场最新的算法突破进行了全面综述。此外，我们提供了一个数字，总结了不同数据集属性上每种方法的性能和一类方法的性能，为研究人员提供了工具来决定哪种类型的算法最适合手头的问题，并确定哪种类别的算法看起来最有前途。最后，我们提供了关于开放问题的看法，并为这个快速发展的领域提出了未来的研究方向。

With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining conversations with humans, and controlling robotic agents. However, there is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment. Offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions, making it feasible to extract policies from large and diverse training datasets. Effective offline RL algorithms have a much wider range of applications than online RL, being particularly appealing for real-world applications, such as education, healthcare, and robotics. In this work, we contribute with a unifying taxonomy to classify offline RL methods. Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field using a unified notation as well as a review of existing benchmarks' properties and shortcomings. Additionally, we provide a figure that summarizes the performance of each method and class of methods on different dataset properties, equipping researchers with the tools to decide which type of algorithm is best suited for the problem at hand and identify which classes of algorithms look the most promising. Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题