论文标题
使用驾驶风险领域从批判性驾驶行为的演示中学习
Learning from Demonstrations of Critical Driving Behaviours Using Driver's Risk Field
论文作者
论文摘要
近年来,模仿学习(IL)已被广泛用作自动驾驶汽车(AV)规划模块的核心。但是,先前的IL作品显示了在安全性关键方案中的样本效率低下和泛化低,很少对其进行测试。结果,IL规划人员可以达到性能高原,在此增加了更多的培训数据以改善学习的政策。首先,我们的工作使用样条系数参数化和离线专家查询提高了IL模型,以提高安全性和训练效率。然后,我们通过优化驾驶员风险字段(DRF)的参数(一种基于LYFT预测数据集中的多代理流量模拟器中实现的参数人类驾驶行为模型)来综合生成关键方案,从而揭示了学到的IL政策的弱点。为了持续改进学习的政策,我们使用增强数据重新训练IL模型。由于DRF的表现力和解释性,所需的驾驶行为可以编码并汇总到原始培训数据中。我们的工作构成了一个完整的开发周期,可以有效,不断地改善闭环中学习的IL政策。最后,我们表明,与以前的最新技术相比,我们的IL规划师开发了较少的培训资源的培训资源的性能较高。
In recent years, imitation learning (IL) has been widely used in industry as the core of autonomous vehicle (AV) planning modules. However, previous IL works show sample inefficiency and low generalisation in safety-critical scenarios, on which they are rarely tested. As a result, IL planners can reach a performance plateau where adding more training data ceases to improve the learnt policy. First, our work presents an IL model using the spline coefficient parameterisation and offline expert queries to enhance safety and training efficiency. Then, we expose the weakness of the learnt IL policy by synthetically generating critical scenarios through optimisation of parameters of the driver's risk field (DRF), a parametric human driving behaviour model implemented in a multi-agent traffic simulator based on the Lyft Prediction Dataset. To continuously improve the learnt policy, we retrain the IL model with augmented data. Thanks to the expressivity and interpretability of the DRF, the desired driving behaviours can be encoded and aggregated to the original training data. Our work constitutes a full development cycle that can efficiently and continuously improve the learnt IL policies in closed-loop. Finally, we show that our IL planner developed with less training resource still has superior performance compared to the previous state-of-the-art.