自我监督强度事件立体声匹配

论文标题

自我监督强度事件立体声匹配

Self-Supervised Intensity-Event Stereo Matching

论文作者

Gu, Jinjin, Zhou, Jinan, Chu, Ringo Sai Wo, Chen, Yan, Zhang, Jiawei, Cheng, Xuanye, Zhang, Song, Ren, Jimmy S.

论文摘要

事件摄像机是新型的生物启发的视觉传感器，它们以高动态范围和低功耗输出微秒精度的高度变化。尽管有这些优势，但事件摄像机不能直接应用于计算成像任务，因为无法同时获得高质量的强度和事件。本文旨在连接独立的事件摄像头和现代强度摄像头，以便应用程序可以利用两个传感器。我们通过多模式立体声匹配任务建立了此连接。我们首先将事件转换为重建的图像，并将现有的立体声网络扩展到此多模式条件。我们提出了一种自我监督的方法来训练多模式立体声网络，而无需使用地面真相差异数据。图像梯度计算出的结构损失用于在此类多模式数据上进行自我监督的学习。利用不同方式观点之间的内部立体声限制，我们引入了一般立体声损失函数，包括差异跨矛盾损失和内部差异损失，与现有方法相比，差异跨度损失和内部差异损失，从而提高了性能和鲁棒性。该实验证明了所提出的方法的有效性，尤其是在合成数据集和实际数据集上提出的一般立体声损失函数。最后，我们阐明了在下游任务（例如视频插值应用程序）中使用对齐事件和强度图像。

Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes in microsecond accuracy with a high dynamic range and low power consumption. Despite these advantages, event cameras cannot be directly applied to computational imaging tasks due to the inability to obtain high-quality intensity and events simultaneously. This paper aims to connect a standalone event camera and a modern intensity camera so that the applications can take advantage of both two sensors. We establish this connection through a multi-modal stereo matching task. We first convert events to a reconstructed image and extend the existing stereo networks to this multi-modality condition. We propose a self-supervised method to train the multi-modal stereo network without using ground truth disparity data. The structure loss calculated on image gradients is used to enable self-supervised learning on such multi-modal data. Exploiting the internal stereo constraint between views with different modalities, we introduce general stereo loss functions, including disparity cross-consistency loss and internal disparity loss, leading to improved performance and robustness compared to existing approaches. The experiments demonstrate the effectiveness of the proposed method, especially the proposed general stereo loss functions, on both synthetic and real datasets. At last, we shed light on employing the aligned events and intensity images in downstream tasks, e.g., video interpolation application.

下载PDF全文

下载文献需遵守相关版权规定

论文标题