论文标题
在具有未知结构的MDP中,Oracle有效的遗憾最小化
Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
论文作者
论文摘要
我们研究了在非偶然的马尔可夫决策过程(FMDP)中最小化的遗憾最小化,在该过程中,所有现有算法都证明了FMDP的分类结构是事先对学习者知道的。在本文中,我们提供了第一种了解FMDP结构的同时最大程度地减少遗憾的算法。我们的算法基于面对不确定性原理的乐观态度,并结合了一种简单的结构学习方法,并且可以有效地将Oracle-Access实施给FMDP计划者。此外,我们提供了算法的一种变体,即使甲骨文仅限于非因素动作,几乎所有现有的近似计划者都是这种情况。最后,我们利用我们的技术证明了已知结构案例的新颖下限,从而缩小了Chen等人的遗憾。 [2021]。
We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. Moreover, we give a variant of our algorithm that remains efficient even when the oracle is limited to non-factored actions, which is the case with almost all existing approximate planners. Finally, we leverage our techniques to prove a novel lower bound for the known structure case, closing the gap to the regret bound of Chen et al. [2021].