论文标题
表格数据中链接的实体需要正确的关注
Entity Linking in Tabular Data Needs the Right Attention
论文作者
论文摘要
了解表格数据的语义含义需要链接实体(EL),以便将每个单元格值与知识库(KB)中的现实世界实体相关联。在这项工作中,我们专注于EL的端到端解决方案,这些解决方案不依赖于目标KB中的事实查找。表格数据包含异质和稀疏上下文,包括列标题,单元格值和表格字幕。我们尝试各种模型,以生成要链接的每个单元格值的向量表示。我们的结果表明,应用注意机制和注意力面罩至关重要,这样模型只能关注最相关的环境并避免信息稀释。最相关的上下文包括:相同的单元格,同一柱单元,标题和标题。但是,对于这种复杂模型,计算复杂性随表格数据的大小而倍增。我们通过引入链接Lite模型(Tell)的表格实体来实现恒定的内存使用情况,该模型仅基于其值,表标头和表格字幕,为单元格生成向量表示。告诉Wikipedia表上的精度为80.8%,比具有二次内存使用情况的最先进模型低0.1%。
Understanding the semantic meaning of tabular data requires Entity Linking (EL), in order to associate each cell value to a real-world entity in a Knowledge Base (KB). In this work, we focus on end-to-end solutions for EL on tabular data that do not rely on fact lookup in the target KB. Tabular data contains heterogeneous and sparse context, including column headers, cell values and table captions. We experiment with various models to generate a vector representation for each cell value to be linked. Our results show that it is critical to apply an attention mechanism as well as an attention mask, so that the model can only attend to the most relevant context and avoid information dilution. The most relevant context includes: same-row cells, same-column cells, headers and caption. Computational complexity, however, grows quadratically with the size of tabular data for such a complex model. We achieve constant memory usage by introducing a Tabular Entity Linking Lite model (TELL ) that generates vector representation for a cell based only on its value, the table headers and the table caption. TELL achieves 80.8% accuracy on Wikipedia tables, which is only 0.1% lower than the state-of-the-art model with quadratic memory usage.