基于视觉变压器的不同方式的统一对象探测器

论文标题

基于视觉变压器的不同方式的统一对象探测器

Unified Object Detector for Different Modalities based on Vision Transformers

论文作者

Shen, Xiaoke, Stamos, Ioannis

论文摘要

传统系统通常需要不同的模型来处理不同的模式，例如一种用于RGB图像的模型，另一个用于深度图像的模型。最近的研究表明，可以使用跨模式转移学习将一种模式的单个模型适应另一种模型。在本文中，我们通过将交叉/模式间传递学习与视觉变压器相结合，以开发统一的探测器，从而扩展这种方法，从而在各种方式上实现卓越的性能。我们的研究设想了机器人技术的应用程序，其中统一系统在不同的照明条件下无缝切换RGB摄像机和深度传感器。重要的是，系统不需要模型架构或重量更新即可实现这种平稳的过渡。具体而言，系统在低光条件下（夜间）以及RGB摄像机和深度传感器或RGB CAEMRA仅在光线充足的环境中使用深度传感器。我们在Sun RGB-D数据集上评估了我们的统一模型，并证明它与SunRGBD16类别中的最新方法相比，它在MAP50的性能方面具有相似或更好的性能，并且仅在Point Cloud Mode中具有可比性的性能。我们还引入了一种新型的模式间混合方法，该方法使我们的模型能够比以前的方法获得明显更好的结果。我们提供我们的代码，包括培训/推理日志和模型检查点，以促进可重复性和进一步的研究。 \ url {https://github.com/liketheflower/uoddm}

Traditional systems typically require different models for processing different modalities, such as one model for RGB images and another for depth images. Recent research has demonstrated that a single model for one modality can be adapted for another using cross-modality transfer learning. In this paper, we extend this approach by combining cross/inter-modality transfer learning with a vision transformer to develop a unified detector that achieves superior performance across diverse modalities. Our research envisions an application scenario for robotics, where the unified system seamlessly switches between RGB cameras and depth sensors in varying lighting conditions. Importantly, the system requires no model architecture or weight updates to enable this smooth transition. Specifically, the system uses the depth sensor during low-lighting conditions (night time) and both the RGB camera and depth sensor or RGB caemra only in well-lit environments. We evaluate our unified model on the SUN RGB-D dataset, and demonstrate that it achieves similar or better performance in terms of mAP50 compared to state-of-the-art methods in the SUNRGBD16 category, and comparable performance in point cloud only mode. We also introduce a novel inter-modality mixing method that enables our model to achieve significantly better results than previous methods. We provide our code, including training/inference logs and model checkpoints, to facilitate reproducibility and further research. \url{https://github.com/liketheflower/UODDM}

下载PDF全文

下载文献需遵守相关版权规定

论文标题