PROTR：时空非自动回旋轨迹预测变压器

论文标题

PROTR：时空非自动回旋轨迹预测变压器

PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction Transformer

论文作者

Achaji, Lina, Barry, Thierno, Fouqueray, Thibault, Moreau, Julien, Aioun, Francois, Charpillet, Francois

论文摘要

如今，我们的移动系统正在发展为旨在提高道路安全的智能车辆时代。由于他们的脆弱性，行人是将从这些发展中受益最大的用户。但是，预测其轨迹是最具挑战性的问题之一。确实，准确的预测需要对可能很复杂的多代理相互作用有很好的了解。学习这些相互作用引起的基本空间和时间模式，甚至更多是许多研究人员正在解决的竞争和开放问题。在本文中，我们介绍了一个称为预测变压器（PRITR）的模型，该模型通过采用分解时空注意模块来从多代理场景中提取特征。它显示的计算需求比以前研究的模型更少，其经验上更好的结果。此外，以前的运动预测作品遭受了由模型预测样本而不是基础真相样本来调节的未来序列引起的暴露偏差问题。为了超越所提出的解决方案，我们利用编码器 - 编码器网络进行并行解码一组学习的对象查询。这种非解放性解决方案避免了对迭代条件的需求，并且可以说会减少训练和测试计算时间。我们在ETH/UCY数据集上评估了我们的模型，这是行人轨迹预测的公开基准。最后，我们通过证明可以更好地解决轨迹预测任务作为非自动回形任务来证明平行解码技术的使用是合理的。

Nowadays, our mobility systems are evolving into the era of intelligent vehicles that aim to improve road safety. Due to their vulnerability, pedestrians are the users who will benefit the most from these developments. However, predicting their trajectory is one of the most challenging concerns. Indeed, accurate prediction requires a good understanding of multi-agent interactions that can be complex. Learning the underlying spatial and temporal patterns caused by these interactions is even more of a competitive and open problem that many researchers are tackling. In this paper, we introduce a model called PRediction Transformer (PReTR) that extracts features from the multi-agent scenes by employing a factorized spatio-temporal attention module. It shows less computational needs than previously studied models with empirically better results. Besides, previous works in motion prediction suffer from the exposure bias problem caused by generating future sequences conditioned on model prediction samples rather than ground-truth samples. In order to go beyond the proposed solutions, we leverage encoder-decoder Transformer networks for parallel decoding a set of learned object queries. This non-autoregressive solution avoids the need for iterative conditioning and arguably decreases training and testing computational time. We evaluate our model on the ETH/UCY datasets, a publicly available benchmark for pedestrian trajectory prediction. Finally, we justify our usage of the parallel decoding technique by showing that the trajectory prediction task can be better solved as a non-autoregressive task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题