元加强学习的自动驾驶汽车的车道变更策略

论文标题

元加强学习的自动驾驶汽车的车道变更策略

Meta Reinforcement Learning-Based Lane Change Strategy for Autonomous Vehicles

论文作者

Ye, Fei, Wang, Pin, Chan, Ching-Yao, Zhang, Jiucai

论文摘要

监督学习和强化学习的最新进展为将相关方法应用于自动驾驶提供了新的机会。但是，在动态变化的环境中，实现自动驾驶操作仍然存在挑战。诸如模仿学习之类的监督学习算法可以通过对大量标记数据进行培训来推广到新的环境，但是，为每个新环境获得足够的数据通常可能是不切实际的或成本良好的。尽管加强学习方法可以通过以试验和错误的方式培训代理来减轻该数据依赖性问题，但在适应新环境时，他们仍然需要从头开始重新培训策略。在本文中，我们提出了一种元加强学习（MRL）方法，以提高代理商在不同的交通环境中进行自动化车道的操作，以进行自动化的车道操纵，这些操作被认为是不同的交通拥堵水平。具体来说，我们在光线下训练该模型以适度交通密度，并在新的沉重的交通密度条件下进行测试。我们使用碰撞率和成功率来量化所提出模型的安全性和有效性。基准模型是基于预处理方法开发的，该方法使用与我们提出的模型相同的网络结构和培训任务进行公平比较。仿真结果表明，当将其推广到重型交通密度的新环境时，所提出的方法的总体成功率比基准模型高20％。碰撞率也比基准模型降低了18％。最后，提出的模型显示出适应于新环境的更稳定，更有效的概括能力，它可以达到100％的成功率和0％的碰撞率，而仅几步梯度更新。

Recent advances in supervised learning and reinforcement learning have provided new opportunities to apply related methodologies to automated driving. However, there are still challenges to achieve automated driving maneuvers in dynamically changing environments. Supervised learning algorithms such as imitation learning can generalize to new environments by training on a large amount of labeled data, however, it can be often impractical or cost-prohibitive to obtain sufficient data for each new environment. Although reinforcement learning methods can mitigate this data-dependency issue by training the agent in a trial-and-error way, they still need to re-train policies from scratch when adapting to new environments. In this paper, we thus propose a meta reinforcement learning (MRL) method to improve the agent's generalization capabilities to make automated lane-changing maneuvers at different traffic environments, which are formulated as different traffic congestion levels. Specifically, we train the model at light to moderate traffic densities and test it at a new heavy traffic density condition. We use both collision rate and success rate to quantify the safety and effectiveness of the proposed model. A benchmark model is developed based on a pretraining method, which uses the same network structure and training tasks as our proposed model for fair comparison. The simulation results shows that the proposed method achieves an overall success rate up to 20% higher than the benchmark model when it is generalized to the new environment of heavy traffic density. The collision rate is also reduced by up to 18% than the benchmark model. Finally, the proposed model shows more stable and efficient generalization capabilities adapting to the new environment, and it can achieve 100% successful rate and 0% collision rate with only a few steps of gradient updates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题