Point2Seq：将3D对象检测为序列

论文标题

Point2Seq：将3D对象检测为序列

Point2Seq: Detecting 3D Objects as Sequences

论文作者

Xue, Yujing, Mao, Jiageng, Niu, Minzhe, Xu, Hang, Mi, Michael Bi, Zhang, Wei, Wang, Xiaogang, Wang, Xinchao

论文摘要

我们提出了一个简单有效的框架，名为Point2Seq，用于从点云中检测3D对象。与通常{预测3D对象的属性的先前方法}相反，我们可以表达对3D对象属性之间的相互依赖性进行建模，从而使得可以更好地检测准确性。具体而言，我们将每个3D对象视为单词的序列，并以自动回归方式将3D对象检测任务重新将3D对象检测任务重新制定为从3D场景中解码单词。我们进一步提出了一个轻量级的场景到序列解码器，可以自动产生以3D场景的功能以及前面单词的提示为条件的单词。预测的单词最终构成了一组序列，这些序列完全描述了场景中的3D对象，然后通过基于相似性的序列匹配，将所有预测的序列自动分配给相应的地面真理。我们的方法在概念上是直观的，并且可以很容易地插入大多数现有的3D检测式骨架上，而无需添加过多的计算开销。另一方面，我们提出的顺序解码范式可以更好地利用复杂3D场景的信息，借助于先前的预测单词。如果没有铃铛和口哨声，我们的方法显着优于先前的锚定和中心的3D对象检测框架，从而在挑战性的数据集以及Waymo Open DataSet上产生了新的最新技术。代码可在\ url {https://github.com/ocnflag/point2seq}中获得。

We present a simple and effective framework, named Point2Seq, for 3D object detection from point clouds. In contrast to previous methods that normally {predict attributes of 3D objects all at once}, we expressively model the interdependencies between attributes of 3D objects, which in turn enables a better detection accuracy. Specifically, we view each 3D object as a sequence of words and reformulate the 3D object detection task as decoding words from 3D scenes in an auto-regressive manner. We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words. The predicted words eventually constitute a set of sequences that completely describe the 3D objects in the scene, and all the predicted sequences are then automatically assigned to the respective ground truths through similarity-based sequence matching. Our approach is conceptually intuitive and can be readily plugged upon most existing 3D-detection backbones without adding too much computational overhead; the sequential decoding paradigm we proposed, on the other hand, can better exploit information from complex 3D scenes with the aid of preceding predicted words. Without bells and whistles, our method significantly outperforms previous anchor- and center-based 3D object detection frameworks, yielding the new state of the art on the challenging ONCE dataset as well as the Waymo Open Dataset. Code is available at \url{https://github.com/ocNflag/point2seq}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题