论文标题

放松多文件摘要的多文件覆盖奖励奖励

A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization

论文作者

Parnell, Jacob, Unanue, Inigo Jauregi, Piccardi, Massimo

论文摘要

近年来,多文件摘要(MDS)取得了重大进展,部分是由于新的专用数据集和宽敞的语言模型所促进的。但是,这些模型的常规限制在于,它们是针对有限的参考文献和普通的最大样子目标进行训练的。至于许多其他生成任务,增强学习(RL)提供了改善MDS模型培训的潜力;但是,它需要精心设计的奖励,以确保参考摘要和输入文档的适当杠杆作用。因此,在本文中,我们建议对MDS基线进行微调,并获得奖励,以平衡基于参考的指标(例如Rouge)与输入文档的覆盖范围。为了实现这种方法,我们利用了放松(Grathwohl等,2018),这是一种当代梯度估计器,既低稳定又无偏见,我们以几种稳定性和计算效率的方式微调了基线。多名和WCEP MDS数据集的实验结果显示,基线比+0.95 pp的平均胭脂得分高达+0.95 pp平均水平和+3.17 pp Meteor得分,并且与文献的竞争成果相比。此外,他们表明,在所有文件中,输入文档的覆盖范围都增加了。

Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models. However, a standing limitation of these models is that they are trained against limited references and with plain maximum-likelihood objectives. As for many other generative tasks, reinforcement learning (RL) offers the potential to improve the training of MDS models; yet, it requires a carefully-designed reward that can ensure appropriate leverage of both the reference summaries and the input documents. For this reason, in this paper we propose fine-tuning an MDS baseline with a reward that balances a reference-based metric such as ROUGE with coverage of the input documents. To implement the approach, we utilize RELAX (Grathwohl et al., 2018), a contemporary gradient estimator which is both low-variance and unbiased, and we fine-tune the baseline in a few-shot style for both stability and computational efficiency. Experimental results over the Multi-News and WCEP MDS datasets show significant improvements of up to +0.95 pp average ROUGE score and +3.17 pp METEOR score over the baseline, and competitive results with the literature. In addition, they show that the coverage of the input documents is increased, and evenly across all documents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源