论文标题
无监督的提取性摘要,用于中文文档的异质图嵌入
Unsupervised Extractive Summarization with Heterogeneous Graph Embeddings for Chinese Document
论文作者
论文摘要
在无监督的提取性摘要的情况下,学习高质量的句子表示对于从输入文档中选择显着句子至关重要。先前的研究更多地着重于采用统计方法或预训练的语言模型(PLM)来提取句子嵌入,同时忽略了单词和句子之间异质类型的相互作用类型中固有的丰富信息。在本文中,我们是第一个提出一种无监督的提取性摘要方法,其中包括中文文档的异质图嵌入(HGE)。构建了异质文本图,以通过合并图形结构信息来捕获相互作用的不同粒度。此外,我们提出的图形是一般且灵活的,可以轻松地集成其他节点(例如关键字)。实验结果表明,在三个摘要数据集中,我们的方法始终优于强基线。
In the scenario of unsupervised extractive summarization, learning high-quality sentence representations is essential to select salient sentences from the input document. Previous studies focus more on employing statistical approaches or pre-trained language models (PLMs) to extract sentence embeddings, while ignoring the rich information inherent in the heterogeneous types of interaction between words and sentences. In this paper, we are the first to propose an unsupervised extractive summarizaiton method with heterogeneous graph embeddings (HGEs) for Chinese document. A heterogeneous text graph is constructed to capture different granularities of interactions by incorporating graph structural information. Moreover, our proposed graph is general and flexible where additional nodes such as keywords can be easily integrated. Experimental results demonstrate that our method consistently outperforms the strong baseline in three summarization datasets.