在行为预测模型中缩小坐标框架差距：蒸馏以高效且准确的以场景为中心的运动预测

论文标题

在行为预测模型中缩小坐标框架差距：蒸馏以高效且准确的以场景为中心的运动预测

Narrowing the Coordinate-frame Gap in Behavior Prediction Models: Distillation for Efficient and Accurate Scene-centric Motion Forecasting

论文作者

Su, DiJia, Douillard, Bertrand, Al-Rfou, Rami, Park, Cheolho, Sapp, Benjamin

论文摘要

行为预测模型近年来已经激增，尤其是在自主驾驶的流行现实机器人技术应用中，代表移动代理可能未来的分布对于安全舒适的运动计划至关重要。在这些模型中，选择代表输入和输出的坐标框架的选择具有至关重要的交易折扣，这些交易量大致分为两个类别之一。以代理为中心的模型转换输入并在以代理为中心的坐标中执行推断。这些模型在场景元素之间的翻译和旋转上本质上不变，在公共排行榜上表现最好，但与代理和场景元素的数量相当四地。以场景为中心的模型使用固定的坐标系来处理所有代理。这为他们提供了在所有代理之间共享表示形式的优势，并提供有效的摊销推理计算，该计算与代理数量线性缩放。但是，这些模型必须学习场景元素之间的翻译和旋转的不变性，并且通常以表现为中心的模型。在这项工作中，我们在概率运动预测模型之间开发了知识蒸馏技术，并应用这些技术来缩小以代理为中心和以场景为中心的模型之间的性能差距。这将以场景为中心的模型性能提高了13.2％，在公共Argoverse基准中，Waymo Open数据集的7.8％，在大型内部数据集中最多可达9.4％。这些以场景为中心的改进的模型在公共排行榜中排名很高，在繁忙的场景中，其以代理为中心的教师的效率高15倍。

Behavior prediction models have proliferated in recent years, especially in the popular real-world robotics application of autonomous driving, where representing the distribution over possible futures of moving agents is essential for safe and comfortable motion planning. In these models, the choice of coordinate frames to represent inputs and outputs has crucial trade offs which broadly fall into one of two categories. Agent-centric models transform inputs and perform inference in agent-centric coordinates. These models are intrinsically invariant to translation and rotation between scene elements, are best-performing on public leaderboards, but scale quadratically with the number of agents and scene elements. Scene-centric models use a fixed coordinate system to process all agents. This gives them the advantage of sharing representations among all agents, offering efficient amortized inference computation which scales linearly with the number of agents. However, these models have to learn invariance to translation and rotation between scene elements, and typically underperform agent-centric models. In this work, we develop knowledge distillation techniques between probabilistic motion forecasting models, and apply these techniques to close the gap in performance between agent-centric and scene-centric models. This improves scene-centric model performance by 13.2% on the public Argoverse benchmark, 7.8% on Waymo Open Dataset and up to 9.4% on a large In-House dataset. These improved scene-centric models rank highly in public leaderboards and are up to 15 times more efficient than their agent-centric teacher counterparts in busy scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题