论文标题
$ \ rm {c {\ small is}}^2 $:简化的故事散文推理评估
$\rm{C {\small IS}}^2$: A Simplified Commonsense Inference Evaluation for Story Prose
论文作者
论文摘要
变形金刚在各种任务上都表现出近乎人类的表现,但是它们并非没有局限性。我们讨论了将变压器结果混为一谈的问题,这些问题被指示同时执行多个任务。特别是,我们专注于故事散文中常识性推理的领域,我们称之为上下文常识推论(CCI)。我们查看葡萄糖(Mostafazadeh等,2020)的数据集以及预测故事句子之间隐性常识推论的任务。由于葡萄糖任务同时生成句子并预测了CCI关系,因此结果中存在汇合。该模型是否真的测量了CCI,还是它产生带有结果的语法文本的能力?在本文中,我们介绍了句子选择中的任务上下文分解推理($ \ rm {c {c {\ small is}}^2 $),这是一个简化的任务,可以通过完全消除语言生成来避免混合。我们的发现强调了未来工作的必要性,即从所需的NLP任务中解除语言生成的必要性。
Transformers have been showing near-human performance on a variety of tasks, but they are not without their limitations. We discuss the issue of conflating results of transformers that are instructed to do multiple tasks simultaneously. In particular, we focus on the domain of commonsense reasoning within story prose, which we call contextual commonsense inference (CCI). We look at the GLUCOSE (Mostafazadeh et al. 2020) dataset and task for predicting implicit commonsense inferences between story sentences. Since the GLUCOSE task simultaneously generates sentences and predicts the CCI relation, there is a conflation in the results. Is the model really measuring CCI or is its ability to generate grammatical text carrying the results? In this paper, we introduce the task contextual commonsense inference in sentence selection ($\rm{C {\small IS}}^2$), a simplified task that avoids conflation by eliminating language generation altogether. Our findings emphasize the necessity of future work to disentangle language generation from the desired NLP tasks at hand.