基于激光雷达的在线3D视频对象检测，基于图的消息传递和时空变压器的注意力

论文标题

基于激光雷达的在线3D视频对象检测，基于图的消息传递和时空变压器的注意力

LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention

论文作者

Yin, Junbo, Shen, Jianbing, Guan, Chenye, Zhou, Dingfu, Yang, Ruigang

论文摘要

现有的基于LIDAR的3D对象检测器通常集中在单帧检测上，同时忽略了连续点云框架中的时空信息。在本文中，我们提出了一个在点云序列上运行的端到端在线3D视频对象检测器。所提出的模型包括一个空间特征编码组件和时空特征聚合组件。在以前的组件中，提出了一个新颖的支柱消息传递网络（PMPNET）来编码每个离散点云帧。它通过迭代消息传递从邻居中自适应地收集了柱状节点的信息，从而有效地扩大了支柱特征的接受场。在后一个组件中，我们提出了一个细心的时空变压器GRU（AST-GRU）来汇总时空信息，从而通过细心的记忆门控机制增强了常规的convru。 AST-GRU包含空间变压器注意（STA）模块和颞变压器注意（TTA）模块，该模块可以分别强调前景对象并分配动态对象。实验结果表明，所提出的3D视频对象检测器在大规模Nuscenes基准测试中实现了最先进的性能。

Existing LiDAR-based 3D object detectors usually focus on the single-frame detection, while ignoring the spatiotemporal information in consecutive point cloud frames. In this paper, we propose an end-to-end online 3D video object detector that operates on point cloud sequences. The proposed model comprises a spatial feature encoding component and a spatiotemporal feature aggregation component. In the former component, a novel Pillar Message Passing Network (PMPNet) is proposed to encode each discrete point cloud frame. It adaptively collects information for a pillar node from its neighbors by iterative message passing, which effectively enlarges the receptive field of the pillar feature. In the latter component, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU) to aggregate the spatiotemporal information, which enhances the conventional ConvGRU with an attentive memory gating mechanism. AST-GRU contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module, which can emphasize the foreground objects and align the dynamic objects, respectively. Experimental results demonstrate that the proposed 3D video object detector achieves state-of-the-art performance on the large-scale nuScenes benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题