关于微调NLP模型中潜在空间的转换

论文标题

关于微调NLP模型中潜在空间的转换

On the Transformation of Latent Space in Fine-Tuned NLP Models

论文作者

Durrani, Nadir, Sajjad, Hassan, Dalvi, Fahim, Alam, Firoj

论文摘要

我们研究了微调NLP模型中潜在空间的演变。与常用的探测框架不同，我们选择了一种无监督的方法来分析表示形式。更具体地说，我们使用分层聚类在表示空间中发现潜在概念。然后，我们使用对齐函数来衡量预训练模型的潜在空间与其微调版本之间的相似性。我们使用传统的语言概念来促进我们的理解，并研究模型空间如何转变为特定于任务的信息。我们进行了彻底的分析，比较了三个模型和三个下游任务的预训练和微调模型。 The notable findings of our work are: i) the latent space of the higher layers evolve towards task-specific concepts, ii) whereas the lower layers retain generic concepts acquired in the pre-trained model, iii) we discovered that some concepts in the higher layers acquire polarity towards the output class, and iv) that these concepts can be used for generating adversarial triggers.

We study the evolution of latent space in fine-tuned NLP models. Different from the commonly used probing-framework, we opt for an unsupervised method to analyze representations. More specifically, we discover latent concepts in the representational space using hierarchical clustering. We then use an alignment function to gauge the similarity between the latent space of a pre-trained model and its fine-tuned version. We use traditional linguistic concepts to facilitate our understanding and also study how the model space transforms towards task-specific information. We perform a thorough analysis, comparing pre-trained and fine-tuned models across three models and three downstream tasks. The notable findings of our work are: i) the latent space of the higher layers evolve towards task-specific concepts, ii) whereas the lower layers retain generic concepts acquired in the pre-trained model, iii) we discovered that some concepts in the higher layers acquire polarity towards the output class, and iv) that these concepts can be used for generating adversarial triggers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题