基于合成数据的实例分割的并行预训练的变压器（PPT）

论文标题

基于合成数据的实例分割的并行预训练的变压器（PPT）

Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation

论文作者

Li, Ming, Wu, Jie, Cai, Jinhang, Qin, Jie, Ren, Yuxi, Xiao, Xuefeng, Zheng, Min, Wang, Rui, Pan, Xin

论文摘要

最近，基于综合数据的实例分割已成为一种极其有利的优化范式，因为它利用模拟渲染和物理学生成高质量的图像宣传对。在本文中，我们提出了一个并行预训练的变压器（PPT）框架，以完成基于合成数据的实例分割任务。具体而言，我们利用现成的预训练的视觉变压器来减轻自然数据和合成数据之间的差距，这有助于在下游合成数据场景中提供良好的概括，并提供很少的样品。基于SWIN-B基的CBNET V2，基于SWINL的CBNET V2和基于Swin-L的基于SWIN-L的均匀器用于并行特征学习，并且这三个模型的结果由像素级非最大抑制（NMS）算法融合在一起，以获得更健壮的结果。实验结果表明，PPT在CVPR2022 AVA可访问性视觉和自主性挑战中排名第一，MAP为65.155％。

Recently, Synthetic data-based Instance Segmentation has become an exceedingly favorable optimization paradigm since it leverages simulation rendering and physics to generate high-quality image-annotation pairs. In this paper, we propose a Parallel Pre-trained Transformers (PPT) framework to accomplish the synthetic data-based Instance Segmentation task. Specifically, we leverage the off-the-shelf pre-trained vision Transformers to alleviate the gap between natural and synthetic data, which helps to provide good generalization in the downstream synthetic data scene with few samples. Swin-B-based CBNet V2, SwinL-based CBNet V2 and Swin-L-based Uniformer are employed for parallel feature learning, and the results of these three models are fused by pixel-level Non-maximum Suppression (NMS) algorithm to obtain more robust results. The experimental results reveal that PPT ranks first in the CVPR2022 AVA Accessibility Vision and Autonomy Challenge, with a 65.155% mAP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题