论文标题
重新考虑超参数以进行微调
Rethinking the Hyperparameters for Fine-tuning
论文作者
论文摘要
预先训练的成像网模型的微调已成为各种计算机视觉任务的事实上的标准。当前的微调实践通常涉及选择超参数的临时选择,并将其固定为通常用于从头开始训练的值。本文重新检查了设定超参数进行微调的几种常见做法。我们的发现基于广泛的经验评估,以对各种转移学习基准进行微调。 (1)虽然先前的工作已经彻底研究了学习率和批处理大小,但进行微调的动量是一个相对未开发的参数。我们发现动量的价值还会影响微调的性能,并将其与以前的理论发现联系起来。 (2)用于微调的最佳超参数,特别是有效学习率,不仅取决于数据集,而且对源域和目标域之间的相似性敏感。这与从头开始训练的超参数相反。 (3)基于参考的正规化使模型接近初始模型并不一定适用于“不同”数据集。我们的发现挑战了微调的常见实践,并鼓励深度学习从业者重新考虑超参数进行微调。
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyperparameters and keeping them fixed to values normally used for training from scratch. This paper re-examines several common practices of setting hyperparameters for fine-tuning. Our findings are based on extensive empirical evaluation for fine-tuning on various transfer learning benchmarks. (1) While prior works have thoroughly investigated learning rate and batch size, momentum for fine-tuning is a relatively unexplored parameter. We find that the value of momentum also affects fine-tuning performance and connect it with previous theoretical findings. (2) Optimal hyperparameters for fine-tuning, in particular, the effective learning rate, are not only dataset dependent but also sensitive to the similarity between the source domain and target domain. This is in contrast to hyperparameters for training from scratch. (3) Reference-based regularization that keeps models close to the initial model does not necessarily apply for "dissimilar" datasets. Our findings challenge common practices of fine-tuning and encourages deep learning practitioners to rethink the hyperparameters for fine-tuning.