论文标题
理解不自然的问题改善了文本的推理
Understanding Unnatural Questions Improves Reasoning over Text
论文作者
论文摘要
对原始文本的复杂问题回答(CQA)是一项具有挑战性的任务。解决此任务的一种突出方法是基于程序员互动框架,在该框架中,程序员将问题映射到一系列推理操作中,然后解释器在原始文本上执行该问题。学习有效的CQA模型需要大量的人类注销数据,包括推理动作的基本真实序列,这是耗时且昂贵的,可以进行大规模收集。在本文中,我们通过将自然的人类生成的问题投射到不自然的机器生成的问题中,以学习高质量程序员(Parser)的挑战,这些问题更方便。我们首先通过数据生成器生成合成(问题,动作序列)对,并训练语义解析器,该语义解析器将合成问题与相应的动作序列相关联。为了捕获在应用调解问题时的多样性,我们学习了一个投影模型,将自然问题映射到他们最类似的不自然问题中,解析器可以很好地工作。没有任何自然培训数据,我们的投影模型为CQA任务提供了高质量的动作序列。实验结果表明,QA模型专门用我们的方法生成的合成数据训练,优于其最先进的对应物,该数据受到人体标记的数据的培训。
Complex question answering (CQA) over raw text is a challenging task. A prominent approach to this task is based on the programmer-interpreter framework, where the programmer maps the question into a sequence of reasoning actions which is then executed on the raw text by the interpreter. Learning an effective CQA model requires large amounts of human-annotated data,consisting of the ground-truth sequence of reasoning actions, which is time-consuming and expensive to collect at scale. In this paper, we address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions which are more convenient to parse. We firstly generate synthetic (question,action sequence) pairs by a data generator, and train a semantic parser that associates synthetic questions with their corresponding action sequences. To capture the diversity when applied tonatural questions, we learn a projection model to map natural questions into their most similar unnatural questions for which the parser can work well. Without any natural training data, our projection model provides high-quality action sequences for the CQA task. Experimental results show that the QA model trained exclusively with synthetic data generated by our method outperforms its state-of-the-art counterpart trained on human-labeled data.