上下文中的自然语言推论 - 研究上下文推理长文

论文标题

上下文中的自然语言推论 - 研究上下文推理长文

Natural Language Inference in Context -- Investigating Contextual Reasoning over Long Texts

论文作者

Liu, Hanmeng, Cui, Leyang, Liu, Jian, Zhang, Yue

论文摘要

自然语言推论（NLI）是一项基本的NLP任务，研究了两个文本之间的关系关系。流行的NLI数据集在句子级别上介绍了任务。尽管足以测试语义表示，但它们在长期测试上下文推理方面却缩短了这是人类推理过程的自然部分。我们介绍了Control，这是一个新的数据集，用于上下文推理长文本。 Control由8,325个专家设计的“上下文 - 假设”与金标签组成，是一个段落级的NLI数据集，重点是复杂的上下文推理类型，例如逻辑推理。它源自具有专家水平质量的警察招聘的竞争选择和招聘测试（口头推理测试）。与以前的NLI基准相比，控制中的材料更具挑战性，涉及一系列推理类型。经验结果表明，最先进的语言模型的表现要比受过教育的人类差得多。我们的数据集还可以作为下游任务的测试集，例如检查摘要的事实正确性。

Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts. Popular NLI datasets present the task at sentence-level. While adequate for testing semantic representations, they fall short for testing contextual reasoning over long texts, which is a natural part of the human inference process. We introduce ConTRoL, a new dataset for ConTextual Reasoning over Long texts. Consisting of 8,325 expert-designed "context-hypothesis" pairs with gold labels, ConTRoL is a passage-level NLI dataset with a focus on complex contextual reasoning types such as logical reasoning. It is derived from competitive selection and recruitment test (verbal reasoning test) for police recruitment, with expert level quality. Compared with previous NLI benchmarks, the materials in ConTRoL are much more challenging, involving a range of reasoning types. Empirical results show that state-of-the-art language models perform by far worse than educated humans. Our dataset can also serve as a testing-set for downstream tasks like Checking Factual Correctness of Summaries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题