SIATRANS：用于RGB-D显着对象检测的Siamese Transformer网络，具有深度图像分类

论文标题

SIATRANS：用于RGB-D显着对象检测的Siamese Transformer网络，具有深度图像分类

SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification

论文作者

Jia, Xingzhao, Changlei, Dongye, Peng, Yanjun

论文摘要

RGB-D SOD使用深度信息来处理具有挑战性的场景并获得高质量的显着图。现有的最新RGB-D显着检测方法压倒性地取决于直接融合深度信息的策略。尽管这些方法通过各种跨模式融合策略提高了显着性预测的准确性，但通过某些质量质量较差的图像提供的错误信息可能会影响显着性预测结果。为了解决这个问题，本文提出了一种新颖的RGB-D显着对象检测模型（SIATRANS），该模型允许与SOD培训同时对深度图像质量分类进行训练。鉴于RGB和深度图像之间的常见信息，Siatrans使用具有共享权重参数的暹罗变压器网络作为编码器并提取RGB，并提取RGB和深度特征在批处理尺寸上加入，从而在不损害性能的情况下节省空间资源。 SIATRANS在骨干网络（T2T-VIT）中使用类令牌来对深度图像的质量进行分类，而不会阻止令牌序列执行显着检测任务。基于变压器的跨模式融合模块（CMF）可以有效地融合RGB和深度信息。在测试过程中，CMF可以根据深度图像的质量分类信号选择融合跨模式信息或增强RGB信息。我们设计的CMF和解码器的最大好处是，它们保持RGB和RGB-D信息解码的一致性：根据测试过程中的分类信号，在相同的模型参数下，SIATRANS解码RGB-D或RGB信息。在9个RGB-D SOD基准数据集上进行的全面实验表明，与最近最新的方法相比，SIATRANS的总体性能和最少的计算最低。

RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题