论文标题

盒子监督视频细分提案网络

Box Supervised Video Segmentation Proposal Network

论文作者

Hannan, Tanveer, Koner, Rajat, Kobold, Jonathan, Schubert, Matthias

论文摘要

视频对象细分(VOS)是由各种完全监督和自我监督的方法对准的。虽然完全监督的方法表现出了很好的结果,但不使用像素级地面真理的自我监督的方法引起了很多关注。但是,自我监督的方法构成了显着的性能差距。框级注释在标签工作和图像分割的结果质量之间提供了平衡的折衷,但尚未用于视频域。在这项工作中,我们提出了一个盒子监督的视频对象细分建议网络,它利用了固有的视频属性。我们的方法以以下方式结合了对象运动:首先,使用双向时间差和新型边界框指导运动补偿计算运动。其次,我们介绍了一种新颖的运动吸引亲和力损失,如果它们具有相似的运动和颜色,则鼓励网络预测积极的像素对。所提出的方法的表现优于最先进的自我监督基准16.4%和6.9%$ \ Mathcal {J} $&$&$ \ Mathcal {f} $得分,而大多数对Davis和YouTube-VOS数据集中的全面监督方法则无需征收网络架构架构架构。我们在数据集上提供了广泛的测试和消融,证明了我们方法的鲁棒性。

Video Object Segmentation (VOS) has been targeted by various fully-supervised and self-supervised approaches. While fully-supervised methods demonstrate excellent results, self-supervised ones, which do not use pixel-level ground truth, attract much attention. However, self-supervised approaches pose a significant performance gap. Box-level annotations provide a balanced compromise between labeling effort and result quality for image segmentation but have not been exploited for the video domain. In this work, we propose a box-supervised video object segmentation proposal network, which takes advantage of intrinsic video properties. Our method incorporates object motion in the following way: first, motion is computed using a bidirectional temporal difference and a novel bounding box-guided motion compensation. Second, we introduce a novel motion-aware affinity loss that encourages the network to predict positive pixel pairs if they share similar motion and color. The proposed method outperforms the state-of-the-art self-supervised benchmark by 16.4% and 6.9% $\mathcal{J}$ &$\mathcal{F}$ score and the majority of fully supervised methods on the DAVIS and Youtube-VOS dataset without imposing network architectural specifications. We provide extensive tests and ablations on the datasets, demonstrating the robustness of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源