论文标题
放松多文件摘要的多文件覆盖奖励奖励
A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
论文作者
论文摘要
近年来,多文件摘要(MDS)取得了重大进展,部分是由于新的专用数据集和宽敞的语言模型所促进的。但是,这些模型的常规限制在于,它们是针对有限的参考文献和普通的最大样子目标进行训练的。至于许多其他生成任务,增强学习(RL)提供了改善MDS模型培训的潜力;但是,它需要精心设计的奖励,以确保参考摘要和输入文档的适当杠杆作用。因此,在本文中,我们建议对MDS基线进行微调,并获得奖励,以平衡基于参考的指标(例如Rouge)与输入文档的覆盖范围。为了实现这种方法,我们利用了放松(Grathwohl等,2018),这是一种当代梯度估计器,既低稳定又无偏见,我们以几种稳定性和计算效率的方式微调了基线。多名和WCEP MDS数据集的实验结果显示,基线比+0.95 pp的平均胭脂得分高达+0.95 pp平均水平和+3.17 pp Meteor得分,并且与文献的竞争成果相比。此外,他们表明,在所有文件中,输入文档的覆盖范围都增加了。
Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models. However, a standing limitation of these models is that they are trained against limited references and with plain maximum-likelihood objectives. As for many other generative tasks, reinforcement learning (RL) offers the potential to improve the training of MDS models; yet, it requires a carefully-designed reward that can ensure appropriate leverage of both the reference summaries and the input documents. For this reason, in this paper we propose fine-tuning an MDS baseline with a reward that balances a reference-based metric such as ROUGE with coverage of the input documents. To implement the approach, we utilize RELAX (Grathwohl et al., 2018), a contemporary gradient estimator which is both low-variance and unbiased, and we fine-tune the baseline in a few-shot style for both stability and computational efficiency. Experimental results over the Multi-News and WCEP MDS datasets show significant improvements of up to +0.95 pp average ROUGE score and +3.17 pp METEOR score over the baseline, and competitive results with the literature. In addition, they show that the coverage of the input documents is increased, and evenly across all documents.