预训练的语言模型如何捕获事实知识？因果启发的分析

论文标题

预训练的语言模型如何捕获事实知识？因果启发的分析

How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis

论文作者

Li, Shaobo, Li, Xiaoguang, Shang, Lifeng, Dong, Zhenhua, Sun, Chengjie, Liu, Bingquan, Ji, Zhenzhou, Jiang, Xin, Liu, Qun

论文摘要

最近，已经有一种趋势来研究预训练的语言模型（PLM）所捕获的事实知识。许多作品表明，PLM的能力可以在“丹特出生在[面具）中填写悬而未决的提示中填写缺失的事实词。但是，PLM如何正确生成结果仍然是一个谜：依靠有效的线索或快捷方式？我们试图通过因果启发的分析来回答这个问题，该分析定量测量并评估PLM依赖于生成缺失单词的单词级别模式。我们检查了与缺少单词的三个典型关联的单词：知识依赖性，位置接近且高度共同出现。我们的分析表明：（1）PLM通过位置接近和高度共同陈述而不是知识依赖的单词产生缺失的事实词。（2）对知识依赖性单词的依赖性比位置接近和高度共同出现的单词更有效。因此，我们得出的结论是，PLMS由于不足的关联而无效地捕获了事实知识。

Recently, there has been a trend to investigate the factual knowledge captured by Pre-trained Language Models (PLMs). Many works show the PLMs' ability to fill in the missing factual words in cloze-style prompts such as "Dante was born in [MASK]." However, it is still a mystery how PLMs generate the results correctly: relying on effective clues or shortcut patterns? We try to answer this question by a causal-inspired analysis that quantitatively measures and evaluates the word-level patterns that PLMs depend on to generate the missing words. We check the words that have three typical associations with the missing words: knowledge-dependent, positionally close, and highly co-occurred. Our analysis shows: (1) PLMs generate the missing factual words more by the positionally close and highly co-occurred words than the knowledge-dependent words; (2) the dependence on the knowledge-dependent words is more effective than the positionally close and highly co-occurred words. Accordingly, we conclude that the PLMs capture the factual knowledge ineffectively because of depending on the inadequate associations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题