论文标题
托托:受控的表与文本生成数据集
ToTTo: A Controlled Table-To-Text Generation Dataset
论文作者
论文摘要
我们提出托托(Totto),这是一个开放域英语表到文本数据集,其中包含超过120,000个培训示例,这些培训示例提出了一个受控的生成任务:给定Wikipedia表和一组突出的表单元,产生了单句话描述。为了获得自然而忠实于源表的生成的目标,我们介绍了一个数据集构造过程,注释者直接修改了Wikipedia的现有候选句子。我们介绍了对数据集和注释过程的系统分析,以及几个最先进的基线的结果。虽然通常会流利,但现有的方法通常会幻觉词组,这些短语不受桌子的支持,这表明该数据集可以作为高精度条件文本生成的有用研究基准。
We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.