LM核：具有上下文相关外部知识的语言模型

论文标题

LM核：具有上下文相关外部知识的语言模型

LM-CORE: Language Models with Contextually Relevant External Knowledge

论文作者

Kaur, Jivat Neet, Bhatia, Sumit, Aggarwal, Milan, Bansal, Rachit, Krishnamurthy, Balaji

论文摘要

基于变压器的大型预训练的语言模型在各种知识密集的任务上取得了令人印象深刻的表现，并且可以在其参数中捕获事实知识。我们认为，鉴于知识和资源需求不断增长，在模型参数中存储大量知识是亚最佳选择。我们认为，更有效的替代方法是向模型提供对上下文相关的结构化知识的明确访问，并训练它以使用该知识。我们提出了LM核 - 实现这一目标的一般框架 - 允许从外部知识源\ textit {解耦}对语言模型培训进行\ textIt {解耦}，并允许在不影响已经训练的模型的情况下更新后者。实验结果表明，LM核心获得外部知识，在知识探索任务上的最先进的知识增强语言模型中实现了重要而强大的表现。可以有效处理知识更新；并在两个下游任务上表现良好。我们还提出了一个彻底的错误分析，突出了LM核的成功和失败。

Large transformer-based pre-trained language models have achieved impressive performance on a variety of knowledge-intensive tasks and can capture factual knowledge in their parameters. We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We posit that a more efficient alternative is to provide explicit access to contextually relevant structured knowledge to the model and train it to use that knowledge. We present LM-CORE -- a general framework to achieve this -- that allows \textit{decoupling} of the language model training from the external knowledge source and allows the latter to be updated without affecting the already trained model. Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks. We also present a thorough error analysis highlighting the successes and failures of LM-CORE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题