学会选择文档规模文本内容操纵的双重信息

论文标题

学会选择文档规模文本内容操纵的双重信息

Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation

论文作者

Feng, Xiaocheng, Sun, Yawei, Qin, Bing, Gong, Heng, Sun, Yibo, Bi, Wei, Liu, Xiaojiang, Liu, Ting

论文摘要

在本文中，我们专注于一项新的实用任务，文档规模的文本内容操纵，这与文本样式传输相反，旨在在更改内容的同时保留文本样式。详细说明，输入是一组结构化记录和一个参考文本，用于描述另一个记录集。输出是一个摘要，可以准确地描述源记录集中的部分内容，并具有相同的参考书写方式。由于缺乏并行数据，因此无法监督该任务，并且具有挑战性地，分别从双向输入中选择合适的记录和样式单词，并生成高保真的长文档。为了解决这些问题，我们首先根据篮球游戏报告语料库建立一个数据集，并提出一种具有交互性注意机制的无监督神经模型，该模型用于学习记录和参考文本之间的语义关系，以实现更好的内容传输和更好的样式保存。此外，我们还探讨了反翻译在构建一些伪训练对的任务中的有效性。经验结果表明，我们的方法优于竞争方法，模型还在句子级数据集中产生了新的最新结果。

In this paper, we focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from bi-aspect inputs respectively and generate a high-fidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the back-translation in our task for constructing some pseudo-training pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new state-of-the-art result on a sentence-level dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题