论文标题

通过认知状态抽象计划BAMDP的信息范围

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

论文作者

Arumugam, Dilip, Singh, Satinder

论文摘要

贝叶斯自适应马尔可夫决策过程(BAMDP)形式主义追求贝叶斯最佳解决方案,以在增强学习中进行勘探 - 探索探索折衷。由于对贝叶斯加强学习问题的精确解决方案的计算是棘手的,因此许多文献都集中在开发合适的近似算法上。在这项工作中,在研究算法设计之前,我们首先在轻度的结构假设下定义了BAMDP计划的复杂度量。随着BAMDP中有效的探索取决于明智地获取信息,我们的复杂性衡量了收集信息和疲惫的认知不确定性的最坏情况。为了说明其意义,我们建立了一种可延期的精确计划算法,该算法利用该措施来显示更有效的计划。然后,我们通过引入特定的状态抽象形式来结束,以降低BAMDP复杂性并引起可计算上的近似计划算法的潜力。

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源