实体辅助语言模型，用于识别值得支票的句子

论文标题

实体辅助语言模型，用于识别值得支票的句子

Entity-Assisted Language Models for Identifying Check-worthy Sentences

论文作者

Su, Ting, Macdonald, Craig, Ounis, Iadh

论文摘要

我们为文本分类和排名提出了一个新的统一框架，该框架可以自动化在政治辩论和语音成绩单中识别值得支票的句子的过程。我们的框架结合了对句子的语义分析，以及通过句子中确定的实体获得的其他实体嵌入。特别是，我们使用最先进的神经语言模型（例如Bert，Albert和Roberta）分析了每个句子的语义含义，而实体的嵌入是从知识图（KG）嵌入模型中获得的。具体而言，我们使用五个不同的语言模型，从六个不同的kg嵌入模型获得的实体嵌入以及两种结合方法来实例化，并导致了几种实体辅助神经语言模型。我们使用Clef'2019和2020 CheckThat中的两个公开数据集对框架的有效性进行了广泛的评估！实验室。我们的结果表明，神经语言模型的表现大大优于传统的TF.IDF和LSTM方法。此外，我们表明，阿尔伯特模型始终是所有测试过神经语言模型中最有效的模型。我们的实体嵌入与文献相比，与句子中的实体之间的相似性和相关性得分（与kg嵌入一起使用时）大大优于其他现有方法。

We propose a new uniform framework for text classification and ranking that can automate the process of identifying check-worthy sentences in political debates and speech transcripts. Our framework combines the semantic analysis of the sentences, with additional entity embeddings obtained through the identified entities within the sentences. In particular, we analyse the semantic meaning of each sentence using state-of-the-art neural language models such as BERT, ALBERT, and RoBERTa, while embeddings for entities are obtained from knowledge graph (KG) embedding models. Specifically, we instantiate our framework using five different language models, entity embeddings obtained from six different KG embedding models, as well as two combination methods leading to several Entity-Assisted neural language models. We extensively evaluate the effectiveness of our framework using two publicly available datasets from the CLEF' 2019 & 2020 CheckThat! Labs. Our results show that the neural language models significantly outperform traditional TF.IDF and LSTM methods. In addition, we show that the ALBERT model is consistently the most effective model among all the tested neural language models. Our entity embeddings significantly outperform other existing approaches from the literature that are based on similarity and relatedness scores between the entities in a sentence, when used alongside a KG embedding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题