一个简单的基准，质疑在持续学习中使用验证模型

论文标题

一个简单的基准，质疑在持续学习中使用验证模型

A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning

论文作者

Janson, Paul, Zhang, Wenxuan, Aljundi, Rahaf, Elhoseiny, Mohamed

论文摘要

随着预训练技术在表示学习中的成功，已经提出了许多基于预验证模型的连续学习方法。其中一些方法中的某些方法在预先训练的表示方面设计了连续的学习机制，并且在训练持续学习期间仅允许最小更新，甚至不允许对主干模型的更新。在本文中，我们质疑这些模型的复杂性是否需要通过将它们与我们设计的简单基线进行比较来实现良好的性能。我们认为，经过预告片的提取器本身可以足够强大，可以在Split-CIFAR100和Core 50基准上实现竞争性甚至更好的持续学习表现。为了验证这一点，我们进行了一个非常简单的基准，即1）使用冷冻预审慎的模型来提取在持续学习阶段遇到的每个类别的图像特征，并计算训练数据上的相应平均特征，2）根据测试样本之间最近的邻居距离和类的平均特征预测输入类别；即最近的平均分类器（NMC）。该基线是单人的，不含示例性的，并且可以无任务（通过不断更新手段）。该基线在10级-CIFAR-100上实现了88.53％，超过了大多数最先进的持续学习方法，这些方法均使用相同的经过验证的变压器模型初始化。我们希望我们的基线可以鼓励未来设计学习系统的进步，即使他们从审慎的权重开始，也可以不断地为学习表征增添质量。

With the success of pretraining techniques in representation learning, a number of continual learning methods based on pretrained models have been proposed. Some of these methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning. In this paper, we question whether the complexity of these models is needed to achieve good performance by comparing them to a simple baseline that we designed. We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks. To validate this, we conduct a very simple baseline that 1) use the frozen pretrained model to extract image features for every class encountered during the continual learning stage and compute their corresponding mean features on training data, and 2) predict the class of the input based on the nearest neighbor distance between test samples and mean features of the classes; i.e., Nearest Mean Classifier (NMC). This baseline is single-headed, exemplar-free, and can be task-free (by updating the means continually). This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all initialized using the same pretrained transformer model. We hope our baseline may encourage future progress in designing learning systems that can continually add quality to the learning representations even if they started from some pretrained weights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题