分析变压器语言模型中编码的概念

论文标题

分析变压器语言模型中编码的概念

Analyzing Encoded Concepts in Transformer Language Models

论文作者

Sajjad, Hassan, Durrani, Nadir, Dalvi, Fahim, Alam, Firoj, Khan, Abdul Rafae, Xu, Jia

论文摘要

我们提出了一个新颖的框架概念，以分析在预训练的语言模型中所学的表示中如何编码潜在概念。它使用聚类来发现编码的概念，并通过与大量的人类定义概念对齐来解释它们。我们对七个变压器语言模型的分析揭示了有趣的见解：i）学习表示的潜在空间与不同程度的不同语言概念重叠，ii）ii）模型中的较低层由词汇概念（例如粘附）（例如粘结）（例如，核心概念）（例如，核心或构成）的概念（例如，形式或构成），是词汇概念（例如，粘结），是核心概念的概念。多方面，无法使用现有的人类定义概念来充分解释。

We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained language models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representations overlap with different linguistic concepts to a varying degree, ii) the lower layers in the model are dominated by lexical concepts (e.g., affixation), whereas the core-linguistic concepts (e.g., morphological or syntactic relations) are better represented in the middle and higher layers, iii) some encoded concepts are multi-faceted and cannot be adequately explained using the existing human-defined concepts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题