论文标题
在马尔可夫决策过程中迈向最小的最佳加强学习
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
论文作者
论文摘要
我们研究了Markov决策过程(FMDP)中的最小值最佳增强学习,它们是具有条件独立过渡组件的MDP。假设已知分解,我们提出了两种基于模型的算法。第一个实现了最小的最佳遗憾,可以为丰富的分类结构提供保证,而第二个则具有更好的计算复杂性,遗憾稍差。我们算法的主要新成分是设计指导探索的奖励术语。我们通过在FMDP的遗憾中呈现几种依赖结构的下限来补充我们的算法,从而揭示了隐藏结构复杂性的困难。
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.