论文标题
一种基于模型的强化学习的收缩方法
A Contraction Approach to Model-based Reinforcement Learning
论文作者
论文摘要
尽管取得了实验性的成功,但基于模型的增强学习仍然缺乏完整的理论理解。为此,我们使用收缩方法分析了累积奖励中的错误。我们考虑连续(非污染)状态和作用空间的随机和确定性状态转变。这种方法不需要强大的假设,并且可以回收地平线的典型二次误差。我们证明,分支推出可以减少此错误,并且对于确定性的过渡至关重要。我们对政策不匹配错误的分析也适用于模仿学习。在这种情况下,我们表明,Gan型学习在训练有素的歧视者训练有素时,比行为克隆具有优势。
Despite its experimental success, Model-based Reinforcement Learning still lacks a complete theoretical understanding. To this end, we analyze the error in the cumulative reward using a contraction approach. We consider both stochastic and deterministic state transitions for continuous (non-discrete) state and action spaces. This approach doesn't require strong assumptions and can recover the typical quadratic error to the horizon. We prove that branched rollouts can reduce this error and are essential for deterministic transitions to have a Bellman contraction. Our analysis of policy mismatch error also applies to Imitation Learning. In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained.