论文标题

我应该在哪里度过拖鞋?视觉预训练方法的效率评估

Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

论文作者

Koppula, Skanda, Li, Yazhe, Shelhamer, Evan, Jaegle, Andrew, Parthasarathy, Nikhil, Arandjelovic, Relja, Carreira, João, Hénaff, Olivier

论文摘要

自我监督的方法在转移学习方面取得了巨大的成功,通常比监督的预训练获得相同或更好的准确性。大多数先前的工作是通过添加复杂的数据增强,多个视图或冗长的培训时间表来增加训练计算的方法。在这项工作中,我们研究了相关但正交的问题:鉴于固定的失败预算,最佳数据集,模型和(自我监督的培训方法)是什么,以获得代表性视觉任务的高精度?鉴于大型数据集的可用性,这种设置通常与学术和行业实验室都更相关。我们检查了五个大型数据集(JFT-300M,Align,Imagenet-1K,Imagenet-21K和Coco)和六种预训练方法(剪辑,Dino,Simclr,Byol,Byol,掩盖自动编码和监督)。我们以类似的方式来表征他们的拖放和co $ _2 $足迹,相对于它们转移到规范图像分割任务时的准确性。我们的分析揭示了预训练方法的计算效率差异及其对数据集质量的依赖性。特别是,我们的结果提出了质疑,即自我监督方法固有地扩展到大型,未经保育的数据的通常假设。因此,我们提倡(1)密切关注数据集策划以及(2)在总计算成本的背景下报告准确性的。

Self-supervised methods have achieved remarkable success in transfer learning, often achieving the same or better accuracy than supervised pre-training. Most prior work has done so by increasing pre-training computation by adding complex data augmentation, multiple views, or lengthy training schedules. In this work, we investigate a related, but orthogonal question: given a fixed FLOP budget, what are the best datasets, models, and (self-)supervised training methods for obtaining high accuracy on representative visual tasks? Given the availability of large datasets, this setting is often more relevant for both academic and industry labs alike. We examine five large-scale datasets (JFT-300M, ALIGN, ImageNet-1K, ImageNet-21K, and COCO) and six pre-training methods (CLIP, DINO, SimCLR, BYOL, Masked Autoencoding, and supervised). In a like-for-like fashion, we characterize their FLOP and CO$_2$ footprints, relative to their accuracy when transferred to a canonical image segmentation task. Our analysis reveals strong disparities in the computational efficiency of pre-training methods and their dependence on dataset quality. In particular, our results call into question the commonly-held assumption that self-supervised methods inherently scale to large, uncurated data. We therefore advocate for (1) paying closer attention to dataset curation and (2) reporting of accuracies in context of the total computational cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源