盒子监督视频细分提案网络

论文标题

盒子监督视频细分提案网络

Box Supervised Video Segmentation Proposal Network

论文作者

Hannan, Tanveer, Koner, Rajat, Kobold, Jonathan, Schubert, Matthias

论文摘要

视频对象细分（VOS）是由各种完全监督和自我监督的方法对准的。虽然完全监督的方法表现出了很好的结果，但不使用像素级地面真理的自我监督的方法引起了很多关注。但是，自我监督的方法构成了显着的性能差距。框级注释在标签工作和图像分割的结果质量之间提供了平衡的折衷，但尚未用于视频域。在这项工作中，我们提出了一个盒子监督的视频对象细分建议网络，它利用了固有的视频属性。我们的方法以以下方式结合了对象运动：首先，使用双向时间差和新型边界框指导运动补偿计算运动。其次，我们介绍了一种新颖的运动吸引亲和力损失，如果它们具有相似的运动和颜色，则鼓励网络预测积极的像素对。所提出的方法的表现优于最先进的自我监督基准16.4％和6.9％$ \ Mathcal {J} $＆$＆$ \ Mathcal {f} $得分，而大多数对Davis和YouTube-VOS数据集中的全面监督方法则无需征收网络架构架构架构。我们在数据集上提供了广泛的测试和消融，证明了我们方法的鲁棒性。

Video Object Segmentation (VOS) has been targeted by various fully-supervised and self-supervised approaches. While fully-supervised methods demonstrate excellent results, self-supervised ones, which do not use pixel-level ground truth, attract much attention. However, self-supervised approaches pose a significant performance gap. Box-level annotations provide a balanced compromise between labeling effort and result quality for image segmentation but have not been exploited for the video domain. In this work, we propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties. Our method incorporates object motion in the following way: first, motion is computed using a bidirectional temporal difference and a novel bounding box-guided motion compensation. Second, we introduce a novel motion-aware affinity loss that encourages the network to predict positive pixel pairs if they share similar motion and color. The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9% $\mathcal{J}$ &$\mathcal{F}$ score and the majority of fully supervised methods on the DAVIS and Youtube-VOS dataset without imposing network architectural specifications. We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题