具有平等技巧的不同曲调：探索统一的优化子空间以进行增量调整

论文标题

具有平等技巧的不同曲调：探索统一的优化子空间以进行增量调整

Different Tunes Played with Equal Skill: Exploring a Unified Optimization Subspace for Delta Tuning

论文作者

Yi, Jing, Chen, Weize, Qin, Yujia, Lin, Yankai, Ding, Ning, Han, Xu, Liu, Zhiyuan, Sun, Maosong, Zhou, Jie

论文摘要

Delta Tuning（DET，也称为参数效率调谐）被视为使用预训练的语言模型（PLM）的新范式。到目前为止，已经提出了具有不同设计元素的各种Dets，并以微调来实现性能。但是，上述成功背后的机制仍然不足，尤其是各种DET之间的连接。为了理解这个神秘感，我们假设可以在统一优化子空间中对不同DET的适应性进行重新聚集，以作为低维优化，这可以通过共同分解不同DET的独立解决方案来找到。然后，我们通过在子空间内进行优化来探索不同DET之间的连接。在实验中，我们发现，对于某个DET，仅在子空间中进行优化可以实现与原始空间相当的性能，并且子空间中找到的解决方案可以转移到另一个Det并实现非平凡的性能。我们还可视化子空间的性能格局，并发现存在一个较大的区域，在该区域中，所有人都表现良好。最后，我们扩展了分析，并显示了微调和DET之间的牢固联系。

Delta tuning (DET, also known as parameter-efficient tuning) is deemed as the new paradigm for using pre-trained language models (PLMs). Up to now, various DETs with distinct design elements have been proposed, achieving performance on par with fine-tuning. However, the mechanisms behind the above success are still under-explored, especially the connections among various DETs. To fathom the mystery, we hypothesize that the adaptations of different DETs could all be reparameterized as low-dimensional optimizations in a unified optimization subspace, which could be found by jointly decomposing independent solutions of different DETs. Then we explore the connections among different DETs by conducting optimization within the subspace. In experiments, we find that, for a certain DET, conducting optimization simply in the subspace could achieve comparable performance to its original space, and the found solution in the subspace could be transferred to another DET and achieve non-trivial performance. We also visualize the performance landscape of the subspace and find that there exists a substantial region where different DETs all perform well. Finally, we extend our analysis and show the strong connections between fine-tuning and DETs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题