论文标题

分解:理解基准的问题

Break It Down: A Question Understanding Benchmark

论文作者

Wolfson, Tomer, Geva, Mor, Gupta, Ankit, Gardner, Matt, Goldberg, Yoav, Deutch, Daniel, Berant, Jonathan

论文摘要

理解自然语言问题需要将问题分解为计算其答案的必要步骤。在这项工作中,我们介绍了问题分解含义表示(QDMR)的问题。 QDMR构成了通过自然语言表达的有序步骤列表,这是回答问题所必需的。我们开发了众包管道,表明可以大规模注释质量QDMR,并释放中断数据集,其中包含超过83K的问题及其QDMR。我们通过证明(a)可以用于改善HotPotQA数据集上的开放域问题来证明QDMR的实用性,(b)可以确定性地转换为假-SQL正式语言,这可以减轻语义解析应用程序中的注释。最后,我们利用Break通过复制将问题解析为QDMR结构来训练序列到序列模型,并表明它的表现大大优于几个天然基线。

Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, showing that quality QDMRs can be annotated at scale, and release the Break dataset, containing over 83K pairs of questions and their QDMRs. We demonstrate the utility of QDMR by showing that (a) it can be used to improve open-domain question answering on the HotpotQA dataset, (b) it can be deterministically converted to a pseudo-SQL formal language, which can alleviate annotation in semantic parsing applications. Last, we use Break to train a sequence-to-sequence model with copying that parses questions into QDMR structures, and show that it substantially outperforms several natural baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源