论文标题

Omni源自Weybly的监督学习,以供视频识别

Omni-sourced Webly-supervised Learning for Video Recognition

论文作者

Duan, Haodong, Zhao, Yue, Xiong, Yuanjun, Liu, Wentao, Lin, Dahua

论文摘要

我们介绍了Omnisource,这是一个新颖的框架,用于利用Web数据训练视频识别模型。 Omnisource克服了数据格式之间的障碍,例如图像,简短视频和长期未经修剪的视频,以供您进行韦伯利的学习。首先,具有多种格式的数据样本,通过特定于任务的数据收集策划并通过教师模型自动过滤,被转换为统一形式。然后提出了一种联合培训策略来处理Webly Sypersed学习中多个数据源和格式之间的域差距。联合培训中采用了几种良好的做法,包括数据平衡,重新采样和跨数据库混合。实验表明,通过利用来自多种来源和格式的数据,Omnisource在培训方面具有更高的数据效率。我们的模型只有350万张图像和80万分钟的视频从互联网上爬行(不到先前作品的2%),因此在Omnisource上学习了2D和3D-CONVNET基线模型的TOP-1准确性,分别在Kinetics-400 Benchmark上提高了3.0%和3.9%。借助Omnisource,我们建立了新的记录,并具有不同的视频识别策略。我们的最佳模型在动力学400基准上分别实现了80.4%,80.5%和83.6 TOP-1精确度,用于训练,用于训练,Imagenet预训练和IG-65M预培训。

We introduce OmniSource, a novel framework for leveraging web data to train video recognition models. OmniSource overcomes the barriers between data formats, such as images, short videos, and long untrimmed videos for webly-supervised learning. First, data samples with multiple formats, curated by task-specific data collection and automatically filtered by a teacher model, are transformed into a unified form. Then a joint-training strategy is proposed to deal with the domain gaps between multiple data sources and formats in webly-supervised learning. Several good practices, including data balancing, resampling, and cross-dataset mixup are adopted in joint training. Experiments show that by utilizing data from multiple sources and formats, OmniSource is more data-efficient in training. With only 3.5M images and 800K minutes videos crawled from the internet without human labeling (less than 2% of prior works), our models learned with OmniSource improve Top-1 accuracy of 2D- and 3D-ConvNet baseline models by 3.0% and 3.9%, respectively, on the Kinetics-400 benchmark. With OmniSource, we establish new records with different pretraining strategies for video recognition. Our best models achieve 80.4%, 80.5%, and 83.6 Top-1 accuracies on the Kinetics-400 benchmark respectively for training-from-scratch, ImageNet pre-training and IG-65M pre-training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源