通过认知状态抽象计划BAMDP的信息范围

论文标题

通过认知状态抽象计划BAMDP的信息范围

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

论文作者

Arumugam, Dilip, Singh, Satinder

论文摘要

贝叶斯自适应马尔可夫决策过程（BAMDP）形式主义追求贝叶斯最佳解决方案，以在增强学习中进行勘探 - 探索探索折衷。由于对贝叶斯加强学习问题的精确解决方案的计算是棘手的，因此许多文献都集中在开发合适的近似算法上。在这项工作中，在研究算法设计之前，我们首先在轻度的结构假设下定义了BAMDP计划的复杂度量。随着BAMDP中有效的探索取决于明智地获取信息，我们的复杂性衡量了收集信息和疲惫的认知不确定性的最坏情况。为了说明其意义，我们建立了一种可延期的精确计划算法，该算法利用该措施来显示更有效的计划。然后，我们通过引入特定的状态抽象形式来结束，以降低BAMDP复杂性并引起可计算上的近似计划算法的潜力。

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题