通过辅助源域监督域自适应动作检测来利用基于实例的混合采样

论文标题

通过辅助源域监督域自适应动作检测来利用基于实例的混合采样

Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action Detection

论文作者

Lu, Yifan, Singh, Gurkirt, Saha, Suman, Van Gool, Luc

论文摘要

我们提出了一种新颖的域自适应动作检测方法和一种新的适应协议，该方案利用了图像级无监督的域自适应（UDA）技术的最新进步，并处理实例级别视频数据的变形。自我训练与跨域混合采样相结合，在UDA（无监督域适应）中的语义分割中表现出显着的性能增长。出于这个事实，我们提出了一种在视频中进行人类行动检测的方法，该方法使用混合采样和基于伪标记的自我训练将知识从源域（注释数据集）转移到目标域（未注释的数据集）。现有的UDA技术遵循用于语义细分的类算法。但是，仅采用ClassMIX进行动作检测不起作用，主要是因为这是两个完全不同的问题，即Pixel-Label分类与实例标签检测。为了解决这个问题，我们提出了一种新型的动作实例混合抽样技术，该技术将基于动作实例而不是动作类别的范围内的信息结合在一起。此外，我们提出了一种新的UDA培训协议，该协议通过使用辅助源域（ASD）的监督来解决长尾样本分布和域移位问题。对于ASD，我们提出了一个带有密集框架级注释的新操作检测数据集。我们将提议的框架称为域自适应动作实例混合（DA-AIM）。我们证明，DA-AIM始终在挑战领域适应基准方面持续优于先前的作品。源代码可从https://github.com/wwwfan628/da-aim获得。

We propose a novel domain adaptive action detection approach and a new adaptation protocol that leverages the recent advancements in image-level unsupervised domain adaptation (UDA) techniques and handle vagaries of instance-level video data. Self-training combined with cross-domain mixed sampling has shown remarkable performance gain in semantic segmentation in UDA (unsupervised domain adaptation) context. Motivated by this fact, we propose an approach for human action detection in videos that transfers knowledge from the source domain (annotated dataset) to the target domain (unannotated dataset) using mixed sampling and pseudo-label-based selftraining. The existing UDA techniques follow a ClassMix algorithm for semantic segmentation. However, simply adopting ClassMix for action detection does not work, mainly because these are two entirely different problems, i.e., pixel-label classification vs. instance-label detection. To tackle this, we propose a novel action instance mixed sampling technique that combines information across domains based on action instances instead of action classes. Moreover, we propose a new UDA training protocol that addresses the long-tail sample distribution and domain shift problem by using supervision from an auxiliary source domain (ASD). For the ASD, we propose a new action detection dataset with dense frame-level annotations. We name our proposed framework as domain-adaptive action instance mixing (DA-AIM). We demonstrate that DA-AIM consistently outperforms prior works on challenging domain adaptation benchmarks. The source code is available at https://github.com/wwwfan628/DA-AIM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题