论文标题
图形神经网络和时空变压器的关注3D视频对象从点云检测
Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds
论文作者
论文摘要
基于LIDAR的3D对象检测的先前工作主要集中在单帧范式上。在本文中,我们建议通过利用多个帧的时间信息(即点云视频)来检测3D对象。我们从经验上将时间信息分为短期和长期模式。为了编码短期数据,我们提出了一个网格消息传递网络(GMPNET),该网络将每个网格(即分组点)视为节点,并与邻居网格构造K-NN图。为了更新网格的功能,gmpnet迭代从其邻居那里收集信息,从而在附近框架中挖掘了运动提示。为了进一步汇总长期帧,我们提出了一个细心的时空变压器GRU(AST-GRU),其中包含空间变压器注意(STA)模块和颞变压器注意(TTA)模块。 STA和TTA增强了香草gru,以专注于小物体并更好地对齐移动的物体。我们的整体框架支持点云中的在线和离线视频对象检测。我们基于普遍的基于锚和无锚的探测器实现算法。在挑战性的Nuscenes基准上,评估结果表明我们方法的出色表现,在提交论文时,在没有任何铃铛和哨声的情况下在排行榜上获得了第1个。
Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., the point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet), which considers each grid (i.e., the grouped points) as a node and constructs a k-NN graph with the neighbor grids. To update features for a grid, GMPNet iteratively collects information from its neighbors, thus mining the motion cues in grids from nearby frames. To further aggregate the long-term frames, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module. STA and TTA enhance the vanilla GRU to focus on small objects and better align the moving objects. Our overall framework supports both online and offline video object detection in point clouds. We implement our algorithm based on prevalent anchor-based and anchor-free detectors. The evaluation results on the challenging nuScenes benchmark show the superior performance of our method, achieving the 1st on the leaderboard without any bells and whistles, by the time the paper is submitted.