论文标题

托托:受控的表与文本生成数据集

ToTTo: A Controlled Table-To-Text Generation Dataset

论文作者

Parikh, Ankur P., Wang, Xuezhi, Gehrmann, Sebastian, Faruqui, Manaal, Dhingra, Bhuwan, Yang, Diyi, Das, Dipanjan

论文摘要

我们提出托托(Totto),这是一个开放域英语表到文本数据集,其中包含超过120,000个培训示例,这些培训示例提出了一个受控的生成任务:给定Wikipedia表和一组突出的表单元,产生了单句话描述。为了获得自然而忠实于源表的生成的目标,我们介绍了一个数据集构造过程,注释者直接修改了Wikipedia的现有候选句子。我们介绍了对数据集和注释过程的系统分析,以及几个最先进的基线的结果。虽然通常会流利,但现有的方法通常会幻觉词组,这些短语不受桌子的支持,这表明该数据集可以作为高精度条件文本生成的有用研究基准。

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源