正则化和数据增强的效果取决于类

论文标题

正则化和数据增强的效果取决于类

The Effects of Regularization and Data Augmentation are Class Dependent

论文作者

Balestriero, Randall, Bottou, Leon, LeCun, Yann

论文摘要

正则化是一种基本技术，可通过约束模型的复杂性来防止过度拟合并改善概括性能。当前的深网很大程度上依赖于诸如数据启发（DA）或权重的正规化器，并采用结构风险最小化，即交叉验证，以选择最佳的正则化超参数。在这项研究中，我们证明了诸如DA或重量衰减之类的技术会产生一个模型，其复杂性降低，而整个类别不公平。从交叉验证中发现的DA或重量衰减的最佳量导致某些类别的灾难性模型性能，例如在带有RESNET50的Imagenet上，“谷仓蜘蛛”分类测试准确性从$ 68 \％$降至$ 46 \％$ $，仅通过在培训期间引入随机作物DA。更令人惊讶的是，当引入非信息正则化技术（例如重量衰减）时，这种性能下降也会下降。这些结果表明，我们对越来越多的概括性能的搜索（对所有类别和样本的平均值）使我们拥有模型和正规化器，这些模型和正规化器会在某些班级上无声地牺牲表演。在下游任务上部署模型时，这种情况可能会变得危险。在Inaturalist上部署的ImageNet预训练的RESNET50认为，在ImageNet预训练阶段引入随机作物DA时，Class \＃8889的表现从$ 70 \％$ $ $ $ 30 \％$。这些结果表明，设计新的正规化器而没有依赖类的偏见仍然是一个开放的研究问题。

Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.

下载PDF全文

下载文献需遵守相关版权规定

论文标题