论文标题

L2R2:利用绑架推理的排名

L2R2: Leveraging Ranking for Abductive Reasoning

论文作者

Zhu, Yunchang, Pang, Liang, Lan, Yanyan, Cheng, Xueqi

论文摘要

提出了绑架性的自然推理任务($α$ NLI)来评估学习系统的绑架推理能力。在$α$ NLI任务中,给出了两个观察结果,并要求最合理的假设从候选人那里挑选出来。现有的方法简单地将其作为分类问题提出,因此在训练过程中使用了跨凝结日志目标。但是,与虚假区分开来并不能衡量假设的合理性,因为所有假设都有机会发生,只有概率是不同的。为了填补这一空白,我们转换为排名的观点,该视角按照其质量顺序对假设进行分类。从这个新的角度来看,在学习对框架的框架下提出了一种新颖的$ L2R^2 $方法。首先,将培训样本重组为排名形式,其中两个观察结果及其假设分别为查询和一组候选文件。然后,ESIM模型或预训练的语言模型,例如Bert或Roberta作为评分函数获得。最后,排名任务的损失功能可以是培训的配对或列表。艺术数据集的实验结果达到了公共排行榜中最新的。

The abductive natural language inference task ($α$NLI) is proposed to evaluate the abductive reasoning ability of a learning system. In the $α$NLI task, two observations are given and the most plausible hypothesis is asked to pick out from the candidates. Existing methods simply formulate it as a classification problem, thus a cross-entropy log-loss objective is used during training. However, discriminating true from false does not measure the plausibility of a hypothesis, for all the hypotheses have a chance to happen, only the probabilities are different. To fill this gap, we switch to a ranking perspective that sorts the hypotheses in order of their plausibilities. With this new perspective, a novel $L2R^2$ approach is proposed under the learning-to-rank framework. Firstly, training samples are reorganized into a ranking form, where two observations and their hypotheses are treated as the query and a set of candidate documents respectively. Then, an ESIM model or pre-trained language model, e.g. BERT or RoBERTa, is obtained as the scoring function. Finally, the loss functions for the ranking task can be either pair-wise or list-wise for training. The experimental results on the ART dataset reach the state-of-the-art in the public leaderboard.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源