dictbert：字典描述知识增强语言模型通过对比度学习预训练

论文标题

dictbert：字典描述知识增强语言模型通过对比度学习预训练

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

论文作者

Chen, Qianglong, Li, Feng-Lin, Xu, Guohai, Yan, Ming, Zhang, Ji, Zhang, Yin

论文摘要

尽管预训练的语言模型（PLM）在各种自然语言处理（NLP）任务上已经达到了最先进的表现，但在处理知识驱动的任务时，它们被证明缺乏知识。尽管为PLM注入知识做出了许多努力，但此问题仍然开放。为了应对挑战，我们提出\ textbf {dictbert}，这是一种具有词典知识增强PLM的新颖方法，比知识图（kg）更容易获取。在预训练期间，我们通过对比度学习将两个新颖的预训练任务注入PLM：\ textIt {dictionary ofertion}和\ textit {entryit {entry descript description discressigration}。在微调中，我们将预训练的dictbert用作插件知识库（KB）来检索输入序列中确定的条目的隐式知识，并将检索到的知识注入输入中，以通过一种新颖的额外跳动注意机制来增强其表示。我们评估了我们的方法对各种知识驱动和语言理解任务，包括NER，关系提取，CommonSenseQA，OpenBookQa和Glue。实验结果表明，我们的模型可以显着改善典型的PLM：它在Bert-Large上分别获得了0.5 \％，2.9 \％，9.0 \％，7.1 \％和3.3 \％的实质性提高，并且对Roberta-large也有效。

Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many efforts made for injecting knowledge into PLMs, this problem remains open. To address the challenge, we propose \textbf{DictBERT}, a novel approach that enhances PLMs with dictionary knowledge which is easier to acquire than knowledge graph (KG). During pre-training, we present two novel pre-training tasks to inject dictionary knowledge into PLMs via contrastive learning: \textit{dictionary entry prediction} and \textit{entry description discrimination}. In fine-tuning, we use the pre-trained DictBERT as a plugin knowledge base (KB) to retrieve implicit knowledge for identified entries in an input sequence, and infuse the retrieved knowledge into the input to enhance its representation via a novel extra-hop attention mechanism. We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE. Experimental results demonstrate that our model can significantly improve typical PLMs: it gains a substantial improvement of 0.5\%, 2.9\%, 9.0\%, 7.1\% and 3.3\% on BERT-large respectively, and is also effective on RoBERTa-large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题