文档感知的位置编码和语言指导的编码用于抽象多文档摘要

论文标题

文档感知的位置编码和语言指导的编码用于抽象多文档摘要

Document-aware Positional Encoding and Linguistic-guided Encoding for Abstractive Multi-document Summarization

论文作者

Ma, Congbo, Zhang, Wei Emma, Pitawela, Pitawelayalage Dasun Dileepa, Qu, Yutong, Zhuang, Haojie, Wang, Hu

论文摘要

多文档摘要中的一个关键挑战是捕获区分单个文档摘要（SDS）和多文件摘要（MDS）的输入文档之间的关系。现有的MDS工作很少解决此问题。一种有效的方法是编码文档位置信息，以帮助模型捕获跨文档关系。但是，现有的MDS模型，例如基于变压器的模型，仅考虑令牌级的位置信息。此外，这些模型无法捕获句子的语言结构，这不可避免地会引起生成的摘要中的混乱。因此，在本文中，我们提出了可以与MDS的变压器体系结构融合的文档意识到的位置编码和语言引导的编码。对于文档感知的位置编码，我们引入了一项通用协议，以指导文档编码功能的选择。对于语言引导的编码，我们建议使用简单但有效的非线性编码学习者进行特征学习，将句法依赖关系嵌入依赖关系掩盖中。广泛的实验表明，所提出的模型可以产生高质量的摘要。

One key challenge in multi-document summarization is to capture the relations among input documents that distinguish between single document summarization (SDS) and multi-document summarization (MDS). Few existing MDS works address this issue. One effective way is to encode document positional information to assist models in capturing cross-document relations. However, existing MDS models, such as Transformer-based models, only consider token-level positional information. Moreover, these models fail to capture sentences' linguistic structure, which inevitably causes confusions in the generated summaries. Therefore, in this paper, we propose document-aware positional encoding and linguistic-guided encoding that can be fused with Transformer architecture for MDS. For document-aware positional encoding, we introduce a general protocol to guide the selection of document encoding functions. For linguistic-guided encoding, we propose to embed syntactic dependency relations into the dependency relation mask with a simple but effective non-linear encoding learner for feature learning. Extensive experiments show the proposed model can generate summaries with high quality.

下载PDF全文

下载文献需遵守相关版权规定

论文标题