论文标题
基准计算机阅读理解:心理学观点
Benchmarking Machine Reading Comprehension: A Psychological Perspective
论文作者
论文摘要
机器阅读理解(MRC)已受到大量关注,作为自然语言理解的基准。但是,MRC的常规任务设计除了模型解释之外缺乏解释性,即,通过模型的阅读理解不能用人类的术语来解释。为此,该立场论文为基于心理学和心理学的MRC数据集设计提供了理论基础,并根据基准测试MRC的先决条件来总结它。我们得出的结论是,未来的数据集应(i)评估该模型构建连贯和扎根的表示的能力,以了解与上下文有关的情况,并(ii)确保通过捷径范围的问题和解释作为任务设计的一部分来确保实质性有效性。
Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding. However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms. To this end, this position paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics, and summarizes it in terms of the prerequisites for benchmarking MRC. We conclude that future datasets should (i) evaluate the capability of the model for constructing a coherent and grounded representation to understand context-dependent situations and (ii) ensure substantive validity by shortcut-proof questions and explanation as a part of the task design.