论文标题
与战术决策的不确定性估算的增强学习
Reinforcement Learning with Uncertainty Estimation for Tactical Decision-Making in Intersections
论文作者
论文摘要
本文研究了如何使用贝叶斯加固学习方法来创建战术决策代理,以在交叉路口中自动驾驶,在此中,代理可以估算其推荐行动的信心。具有其他随机先验功能(RPF)的神经网络的合奏通过使用自举体验重播记忆来训练。合奏成员的估计$ Q $值的变化系数用于近似不确定性,并确定代理是否足够自信以做出特定决定的标准。集合RPF方法的性能在相交方案中评估,并将其与标准的深Q网络方法进行比较。结果表明,训练有素的集合RPF代理可以检测出远离训练分布的情况,并且在训练分布中很少发生的情况下,均可检测出较高的不确定性病例。在这项研究中,不确定性信息用于在未知情况下选择安全的行动,从而消除了训练分布中的所有碰撞以及在分布之外的大多数碰撞。
This paper investigates how a Bayesian reinforcement learning method can be used to create a tactical decision-making agent for autonomous driving in an intersection scenario, where the agent can estimate the confidence of its recommended actions. An ensemble of neural networks, with additional randomized prior functions (RPF), are trained by using a bootstrapped experience replay memory. The coefficient of variation in the estimated $Q$-values of the ensemble members is used to approximate the uncertainty, and a criterion that determines if the agent is sufficiently confident to make a particular decision is introduced. The performance of the ensemble RPF method is evaluated in an intersection scenario, and compared to a standard Deep Q-Network method. It is shown that the trained ensemble RPF agent can detect cases with high uncertainty, both in situations that are far from the training distribution, and in situations that seldom occur within the training distribution. In this study, the uncertainty information is used to choose safe actions in unknown situations, which removes all collisions from within the training distribution, and most collisions outside of the distribution.