从抽象观察中对基于模型的强化学习的分析

论文标题

从抽象观察中对基于模型的强化学习的分析

An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

论文作者

Starre, Rolf A. N., Loog, Marco, Congeduti, Elena, Oliehoek, Frans A.

论文摘要

马尔可夫决策过程（MDP）中基于模型的增强学习（MBRL）的许多方法为他们提供的模型的准确性和学习效率提供了保证。同时，状态抽象技术允许减少MDP的大小，同时相对于原始问题保持有限的损失。因此，令人惊讶的是，在结合两种技术时，即MBRL仅观察抽象状态时，没有任何保证。我们的理论分析表明，抽象可以在网上收集的样本（例如在现实世界中）引入依赖性。这意味着，在不考虑此依赖性的情况下，MBRL的结果不会直接扩展到此设置。我们的结果表明，我们可以使用浓度不平等来克服这个问题。这个结果使得以抽象的方式将现有MBRL算法的保证扩展到设置。我们通过将R-MAX（一种原型MBRL算法）与抽象相结合，从而为基于模型的“来自摘要观测值的RL”提供了第一个保证：基于模型的加固学习与抽象模型相结合，从而说明了这一点。

Many methods for Model-based Reinforcement learning (MBRL) in Markov decision processes (MDPs) provide guarantees for both the accuracy of the model they can deliver and the learning efficiency. At the same time, state abstraction techniques allow for a reduction of the size of an MDP while maintaining a bounded loss with respect to the original problem. Therefore, it may come as a surprise that no such guarantees are available when combining both techniques, i.e., where MBRL merely observes abstract states. Our theoretical analysis shows that abstraction can introduce a dependence between samples collected online (e.g., in the real world). That means that, without taking this dependence into account, results for MBRL do not directly extend to this setting. Our result shows that we can use concentration inequalities for martingales to overcome this problem. This result makes it possible to extend the guarantees of existing MBRL algorithms to the setting with abstraction. We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based 'RL from Abstracted Observations': model-based reinforcement learning with an abstract model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题