通过适应性优化的子网有效地进行微调预训练的语言模型

论文标题

通过适应性优化的子网有效地进行微调预训练的语言模型

Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

论文作者

Zhang, Haojie, Li, Ge, Li, Jia, Zhang, Zhongjin, Zhu, Yuqi, Jin, Zhi

论文摘要

大规模的预训练的语言模型最近在多种下游任务上取得了令人印象深刻的结果。但是，在有限的目标数据集上对极度大规模的预训练语言模型进行微调通常会因过度拟合和表示降解而困扰。在本文中，我们建议在微调过程中针对大规模预训练的模型的动态参数选择（DPS）算法，该算法可以自适应地选择一个更有希望的子网络以基于后传播的梯度执行分阶段更新。胶水基准上的实验表明，在整体性能和稳定性方面，DPS优于先前的微调方法，并且通过可变的预训练的语言模型可以始终如一地取得更好的结果。此外，DPS在室外转移实验和低资源场景中带来了很大的改进，这表明它可以保持稳定的一般上下文特征并减少表示形式的崩溃。我们在https://github.com/zhanghaojie077/dps上发布代码

Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at https://github.com/ZhangHaojie077/DPS

下载PDF全文

下载文献需遵守相关版权规定

论文标题