论文标题
混合PAC增强学习算法
A Hybrid PAC Reinforcement Learning Algorithm
论文作者
论文摘要
本文为马尔可夫决策过程(MDPS)提供了一种新的混合动力,可能是易于保持父母的有利特征的马尔可夫决策过程(MDP)的算法(RL)算法。设计的算法(称为DYNA删除的Q学习(DDQ)算法)结合了无模型和基于模型的学习方法,同时在大多数情况下都表现出了两者的表现。该论文包括对DDQ算法的PAC分析及其样品复杂性的推导。提供了数值结果,以支持有关新算法的样本效率的主张,与其父母以及最著名的无模型和基于模型的算法相比。
This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of its parents. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free and model-based learning approaches while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results are provided to support the claim regarding the new algorithm's sample efficiency compared to its parents as well as the best known model-free and model-based algorithms in application.