论文标题
3DGTN:用于点云分类和分段的3D双重注意力集体变压器网络
3DGTN: 3D Dual-Attention GLocal Transformer Network for Point Cloud Classification and Segmentation
论文作者
论文摘要
尽管变形金刚在3D点云处理中的应用已取得了重大进展和成功,但对于现有的3D变压器方法来说,仍然具有挑战性,可以有效,准确地学习有价值的全球功能和有价值的本地功能,以改善应用程序。本文提出了一个新颖的点云代表性学习网络,称为3D双重自我发作全球本地(Glocal)变压器网络(3DGTN),以改进分类和分割任务中的特征学习,并提供以下关键贡献。首先,具有双重自我注意力的机制(即,一种新颖的点斑点自我注意,称为PPSA和渠道自我注意力)的全球特征学习(GFL)块旨在有效地学习GloCal上下文信息。其次,GFL块与基于多尺度图卷积的本地特征聚合(LFA)块集成在一起,从而导致全局本地(Glocal)信息提取模块,该模块可以有效地捕获关键信息。第三,使用一系列的glocal模块来构建一个新的分层编码器结构,以以分层的方式以不同的尺度学习“ Glocal”信息。在分类和分割数据集上评估了所提出的框架,这表明所提出的方法能够在分类和分割任务上胜过许多最新方法。
Although the application of Transformers in 3D point cloud processing has achieved significant progress and success, it is still challenging for existing 3D Transformer methods to efficiently and accurately learn both valuable global features and valuable local features for improved applications. This paper presents a novel point cloud representational learning network, called 3D Dual Self-attention Global Local (GLocal) Transformer Network (3DGTN), for improved feature learning in both classification and segmentation tasks, with the following key contributions. First, a GLocal Feature Learning (GFL) block with the dual self-attention mechanism (i.e., a novel Point-Patch Self-Attention, called PPSA, and a channel-wise self-attention) is designed to efficiently learn the GLocal context information. Second, the GFL block is integrated with a multi-scale Graph Convolution-based Local Feature Aggregation (LFA) block, leading to a Global-Local (GLocal) information extraction module that can efficiently capture critical information. Third, a series of GLocal modules are used to construct a new hierarchical encoder-decoder structure to enable the learning of "GLocal" information in different scales in a hierarchical manner. The proposed framework is evaluated on both classification and segmentation datasets, demonstrating that the proposed method is capable of outperforming many state-of-the-art methods on both classification and segmentation tasks.