在车道上进行战术决策的新方法：示例有效的深度Q学习和安全反馈奖励

论文标题

在车道上进行战术决策的新方法：示例有效的深度Q学习和安全反馈奖励

A New Approach for Tactical Decision Making in Lane Changing: Sample Efficient Deep Q Learning with a Safety Feedback Reward

论文作者

Yavas, M. Ugur, Ure, N. Kemal, Kumbasar, Tufan

论文摘要

自动化车道的变化是高度自动化车辆的最具挑战性的任务之一，由于其关键性，不确定和多机构性质。本文介绍了最先进的Q学习方法的新型部署，即彩虹DQN，该方法使用新的安全驱动奖励计划来解决动态和不确定的模拟环境中解决这些问题。我们提出了各种比较结果，以表明我们从安全层获得奖励反馈的新方法显着提高了代理的性能和样本效率。此外，通过彩虹DQN的新型部署，可以通过检查代理的Q值的分布来提取有关代理行为的更多直觉。拟议的算法在挑战性的方案中显示出比基线算法出色的性能，并且只有200000培训步骤（即相当于驾驶55小时）。

Automated lane change is one of the most challenging task to be solved of highly automated vehicles due to its safety-critical, uncertain and multi-agent nature. This paper presents the novel deployment of the state of art Q learning method, namely Rainbow DQN, that uses a new safety driven rewarding scheme to tackle the issues in an dynamic and uncertain simulation environment. We present various comparative results to show that our novel approach of having reward feedback from the safety layer dramatically increases both the agent's performance and sample efficiency. Furthermore, through the novel deployment of Rainbow DQN, it is shown that more intuition about the agent's actions is extracted by examining the distributions of generated Q values of the agents. The proposed algorithm shows superior performance to the baseline algorithm in the challenging scenarios with only 200000 training steps (i.e. equivalent to 55 hours driving).

下载PDF全文

下载文献需遵守相关版权规定

论文标题