论文标题
带有稀疏边界盒的视频区域注释
Video Region Annotation with Sparse Bounding Boxes
论文作者
论文摘要
视频分析一直在朝着更详细的解释(例如细分)方向发展,并令人鼓舞。但是,这些任务越来越依赖于空间和时间上的密集注释的培训数据。由于这种注释是劳动密集型的,因此很少有具有详细区域边界的密集注释的视频数据。这项工作旨在通过学习从稀疏注释的目标区域的界限框中为视频的所有框架自动生成区域边界来解决这一难题。我们使用体积图卷积网络(VGCN)实现了这一目标,该网络可以使用周围外观和运动的时空体积在区域边界上进行迭代键入。 VGCN的全球优化使其比现有解决方案更强大和概括。使用两个最新数据集(包括消融研究)的实验结果证明了我们方法的有效性和优势。
Video analysis has been moving towards more detailed interpretation (e.g. segmentation) with encouraging progresses. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labour-intensive, few densely annotated video data with detailed region boundaries exist. This work aims to resolve this dilemma by learning to automatically generate region boundaries for all frames of a video from sparsely annotated bounding boxes of target regions. We achieve this with a Volumetric Graph Convolutional Network (VGCN), which learns to iteratively find keypoints on the region boundaries using the spatio-temporal volume of surrounding appearance and motion. The global optimization of VGCN makes it significantly stronger and generalize better than existing solutions. Experimental results using two latest datasets (one real and one synthetic), including ablation studies, demonstrate the effectiveness and superiority of our method.