带有变压器的实时3D单一对象跟踪

论文标题

带有变压器的实时3D单一对象跟踪

Real-time 3D Single Object Tracking with Transformer

论文作者

Shan, Jiayao, Zhou, Sifan, Cui, Yubo, Fang, Zheng

论文摘要

基于激光雷达的3D单一对象跟踪是机器人技术和自动驾驶中的一个具有挑战性的问题。当前，现有方法通常会遇到长距离对象通常具有非常稀疏或部分封闭的点云的问题，这使得模型含糊不清。模棱两可的功能将很难找到目标对象，并最终导致不良跟踪结果。为了解决此问题，我们使用功能强大的变压器体系结构，并为基于点云的3D单一对象跟踪任务提出一个点轨道转换器（PTT）模块。具体而言，PTT模块通过计算注意力重量来生成微调的注意力特征，该功能指导追踪器的重点关注目标的重要特征，并提高复杂场景中的跟踪能力。为了评估我们的PTT模块，我们将PTT嵌入主要方法中，并构建一个名为PTT-NET的新型3D SOT跟踪器。在PTT-NET中，我们分别将PTT嵌入了投票阶段和提案生成阶段。在投票阶段，PTT模块可以模拟点斑块之间的交互作用，该点斑点可以学习上下文依赖于上下文的特征。同时，提案生成阶段中的PTT模块可以捕获对象和背景之间的上下文信息。我们在Kitti和Nuscenes数据集上评估了PTT-NET。实验结果证明了PTT模块的有效性和PTT-NET的优越性，PTT-NET的优势超过了基线，在CAR类别中〜10％。同时，我们的方法在稀疏场景中也具有显着的性能提高。通常，变压器和跟踪管道的组合使我们的PTT-NET能够在两个数据集上实现最先进的性能。此外，PTT-NET可以在NVIDIA 1080TI GPU上实时以40fps实时运行。我们的代码是为研究社区开源的，网址为https://github.com/shanjiayao/ptt。

LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous driving. Currently, existing approaches usually suffer from the problem that objects at long distance often have very sparse or partially-occluded point clouds, which makes the features extracted by the model ambiguous. Ambiguous features will make it hard to locate the target object and finally lead to bad tracking results. To solve this problem, we utilize the powerful Transformer architecture and propose a Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task. Specifically, PTT module generates fine-tuned attention features by computing attention weights, which guides the tracker focusing on the important features of the target and improves the tracking ability in complex scenarios. To evaluate our PTT module, we embed PTT into the dominant method and construct a novel 3D SOT tracker named PTT-Net. In PTT-Net, we embed PTT into the voting stage and proposal generation stage, respectively. PTT module in the voting stage could model the interactions among point patches, which learns context-dependent features. Meanwhile, PTT module in the proposal generation stage could capture the contextual information between object and background. We evaluate our PTT-Net on KITTI and NuScenes datasets. Experimental results demonstrate the effectiveness of PTT module and the superiority of PTT-Net, which surpasses the baseline by a noticeable margin, ~10% in the Car category. Meanwhile, our method also has a significant performance improvement in sparse scenarios. In general, the combination of transformer and tracking pipeline enables our PTT-Net to achieve state-of-the-art performance on both two datasets. Additionally, PTT-Net could run in real-time at 40FPS on NVIDIA 1080Ti GPU. Our code is open-sourced for the research community at https://github.com/shanjiayao/PTT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题