使用自然主义的人类驾驶数据进行驾驶行为建模，并进行逆增强学习

论文标题

使用自然主义的人类驾驶数据进行驾驶行为建模，并进行逆增强学习

Driving Behavior Modeling using Naturalistic Human Driving Data with Inverse Reinforcement Learning

论文作者

Huang, Zhiyu, Wu, Jingda, Lv, Chen

论文摘要

驾驶行为建模对于设计安全，智能和个性化的自动驾驶系统至关重要。在本文中，利用了模拟人类决策机制的基于内部奖励功能的内部驾驶模型。为了从自然主义的人类驾驶数据中推断奖励函数参数，我们提出了一个关于人类驾驶行为的结构假设，该假设侧重于离散的潜在驾驶意图。它将连续的行为建模问题转换为离散设置，从而使最大的熵逆增强学习（IRL）可拖动以学习奖励功能。具体而言，采用多项式轨迹采样器来生成候选轨迹，以考虑高级意图并在最大熵IRL框架中近似分区函数。构建了一个考虑自我和周围车辆之间交互行为的环境模型，以更好地估计产生的轨迹。提出的方法用于从NGSIM高速公路驾驶数据集中学习个性化奖励功能。定性的结果表明，学习的奖励功能能够明确表达不同驱动因素的偏好并解释其决定。定量结果表明，学到的奖励功能是可靠的，这仅仅是由于在测试条件下应用奖励功能时，与人类驾驶轨迹的接近距离的边缘下降。对于测试性能，个性化的建模方法优于一般建模方法，大大降低了人类相似性的建模误差（一种定制指标，以衡量准确性），与其他基线方法相比，这两种方法提供了更好的结果。

Driving behavior modeling is of great importance for designing safe, smart, and personalized autonomous driving systems. In this paper, an internal reward function-based driving model that emulates the human's decision-making mechanism is utilized. To infer the reward function parameters from naturalistic human driving data, we propose a structural assumption about human driving behavior that focuses on discrete latent driving intentions. It converts the continuous behavior modeling problem to a discrete setting and thus makes maximum entropy inverse reinforcement learning (IRL) tractable to learn reward functions. Specifically, a polynomial trajectory sampler is adopted to generate candidate trajectories considering high-level intentions and approximate the partition function in the maximum entropy IRL framework. An environment model considering interactive behaviors among the ego and surrounding vehicles is built to better estimate the generated trajectories. The proposed method is applied to learn personalized reward functions for individual human drivers from the NGSIM highway driving dataset. The qualitative results demonstrate that the learned reward functions are able to explicitly express the preferences of different drivers and interpret their decisions. The quantitative results reveal that the learned reward functions are robust, which is manifested by only a marginal decline in proximity to the human driving trajectories when applying the reward function in the testing conditions. For the testing performance, the personalized modeling method outperforms the general modeling approach, significantly reducing the modeling errors in human likeness (a custom metric to gauge accuracy), and these two methods deliver better results compared to other baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题