Gosum：通过增强学习和图表组织状态的长文档的提取性汇总

论文标题

Gosum：通过增强学习和图表组织状态的长文档的提取性汇总

GoSum: Extractive Summarization of Long Documents by Reinforcement Learning and Graph Organized discourse state

论文作者

Bian, Junyi, Huang, Xiaodi, Zhou, Hong, Zhu, Shanfeng

论文摘要

从长文件中提取摘要可以使用文件的结构信息视为句子分类。如何使用此类结构信息总结文档是具有挑战性的。在本文中，我们提出了Gosum，这是一种基于长纸摘要的基于新颖的图形和增强学习模型。特别是，Gosum通过在不同的话语级别为每个输入文档构建一个异质图来编码句子在增强学习中的句子。图中的边缘反映了文档的话语层次结构，以限制跨截面边界的语义漂移。我们在两个科学文章数据集上评估了Gosum摘要：PubMed和Arxiv。实验结果表明，与提取性模型和抽象模型的强基础相比，Gosum获得了最先进的结果。消融研究进一步验证了我们的gosum的表现受益于使用话语信息。

Extracting summaries from long documents can be regarded as sentence classification using the structural information of the documents. How to use such structural information to summarize a document is challenging. In this paper, we propose GoSum, a novel graph and reinforcement learning based extractive model for long-paper summarization. In particular, GoSum encodes sentence states in reinforcement learning by building a heterogeneous graph for each input document at different discourse levels. An edge in the graph reflects the discourse hierarchy of a document for restraining the semantic drifts across section boundaries. We evaluate GoSum on two datasets of scientific articles summarization: PubMed and arXiv. The experimental results have demonstrated that GoSum achieve state-of-the-art results compared with strong baselines of both extractive and abstractive models. The ablation studies further validate that the performance of our GoSum benefits from the use of discourse information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题