论文标题

分析变压器语言模型中编码的概念

Analyzing Encoded Concepts in Transformer Language Models

论文作者

Sajjad, Hassan, Durrani, Nadir, Dalvi, Fahim, Alam, Firoj, Khan, Abdul Rafae, Xu, Jia

论文摘要

我们提出了一个新颖的框架概念,以分析在预训练的语言模型中所学的表示中如何编码潜在概念。它使用聚类来发现编码的概念,并通过与大量的人类定义概念对齐来解释它们。我们对七个变压器语言模型的分析揭示了有趣的见解:i)学习表示的潜在空间与不同程度的不同语言概念重叠,ii)ii)模型中的较低层由词汇概念(例如粘附)(例如粘结)(例如,核心概念)(例如,核心或构成)的概念(例如,形式或构成),是词汇概念(例如,粘结),是核心概念的概念。多方面,无法使用现有的人类定义概念来充分解释。

We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained language models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representations overlap with different linguistic concepts to a varying degree, ii) the lower layers in the model are dominated by lexical concepts (e.g., affixation), whereas the core-linguistic concepts (e.g., morphological or syntactic relations) are better represented in the middle and higher layers, iii) some encoded concepts are multi-faceted and cannot be adequately explained using the existing human-defined concepts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源