L2R2：利用绑架推理的排名

论文标题

L2R2：利用绑架推理的排名

L2R2: Leveraging Ranking for Abductive Reasoning

论文作者

Zhu, Yunchang, Pang, Liang, Lan, Yanyan, Cheng, Xueqi

论文摘要

提出了绑架性的自然推理任务（$α$ NLI）来评估学习系统的绑架推理能力。在$α$ NLI任务中，给出了两个观察结果，并要求最合理的假设从候选人那里挑选出来。现有的方法简单地将其作为分类问题提出，因此在训练过程中使用了跨凝结日志目标。但是，与虚假区分开来并不能衡量假设的合理性，因为所有假设都有机会发生，只有概率是不同的。为了填补这一空白，我们转换为排名的观点，该视角按照其质量顺序对假设进行分类。从这个新的角度来看，在学习对框架的框架下提出了一种新颖的$ L2R^2 $方法。首先，将培训样本重组为排名形式，其中两个观察结果及其假设分别为查询和一组候选文件。然后，ESIM模型或预训练的语言模型，例如Bert或Roberta作为评分函数获得。最后，排名任务的损失功能可以是培训的配对或列表。艺术数据集的实验结果达到了公共排行榜中最新的。

The abductive natural language inference task ($α$NLI) is proposed to evaluate the abductive reasoning ability of a learning system. In the $α$NLI task, two observations are given and the most plausible hypothesis is asked to pick out from the candidates. Existing methods simply formulate it as a classification problem, thus a cross-entropy log-loss objective is used during training. However, discriminating true from false does not measure the plausibility of a hypothesis, for all the hypotheses have a chance to happen, only the probabilities are different. To fill this gap, we switch to a ranking perspective that sorts the hypotheses in order of their plausibilities. With this new perspective, a novel $L2R^2$ approach is proposed under the learning-to-rank framework. Firstly, training samples are reorganized into a ranking form, where two observations and their hypotheses are treated as the query and a set of candidate documents respectively. Then, an ESIM model or pre-trained language model, e.g. BERT or RoBERTa, is obtained as the scoring function. Finally, the loss functions for the ranking task can be either pair-wise or list-wise for training. The experimental results on the ART dataset reach the state-of-the-art in the public leaderboard.

下载PDF全文

下载文献需遵守相关版权规定

论文标题