论文标题
用字符和单词级别n-grams改进基于图的文本表示
Improving Graph-Based Text Representations with Character and Word Level N-grams
论文作者
论文摘要
基于图形的文本表示侧重于如何将文本文档表示为用于在语料库内的代币和文档之间利用依赖性信息的图形。尽管对图表学习的兴趣越来越高,但在探索基于图形的文本表示的新方法方面仍有有限的研究,这对于下游自然语言处理任务很重要。在本文中,我们首先提出了一个新的异质单词字符文本图,该图形将单词和字符n-gram节点与文档节点结合在一起,从而使我们能够更好地学习这些实体之间的依赖关系。此外,我们提出了两个新的基于图的神经模型WCTEXTGCN和WCTEXTGAT,用于对我们提出的文本图进行建模。文本分类和自动文本摘要基准的广泛实验表明,我们提出的模型始终超过竞争性基线和最先进的基于图形的模型。
Graph-based text representation focuses on how text documents are represented as graphs for exploiting dependency information between tokens and documents within a corpus. Despite the increasing interest in graph representation learning, there is limited research in exploring new ways for graph-based text representation, which is important in downstream natural language processing tasks. In this paper, we first propose a new heterogeneous word-character text graph that combines word and character n-gram nodes together with document nodes, allowing us to better learn dependencies among these entities. Additionally, we propose two new graph-based neural models, WCTextGCN and WCTextGAT, for modeling our proposed text graph. Extensive experiments in text classification and automatic text summarization benchmarks demonstrate that our proposed models consistently outperform competitive baselines and state-of-the-art graph-based models.