在马尔可夫决策过程中迈向最小的最佳加强学习

论文标题

在马尔可夫决策过程中迈向最小的最佳加强学习

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

论文作者

Tian, Yi, Qian, Jian, Sra, Suvrit

论文摘要

我们研究了Markov决策过程（FMDP）中的最小值最佳增强学习，它们是具有条件独立过渡组件的MDP。假设已知分解，我们提出了两种基于模型的算法。第一个实现了最小的最佳遗憾，可以为丰富的分类结构提供保证，而第二个则具有更好的计算复杂性，遗憾稍差。我们算法的主要新成分是设计指导探索的奖励术语。我们通过在FMDP的遗憾中呈现几种依赖结构的下限来补充我们的算法，从而揭示了隐藏结构复杂性的困难。

We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题