引入一个框架来评估自然语言处理的新创建的问题

论文标题

引入一个框架来评估自然语言处理的新创建的问题

Introducing a framework to assess newly created questions with Natural Language Processing

论文作者

Benedetto, Luca, Cappelli, Andrea, Turrin, Roberto, Cremonesi, Paolo

论文摘要

诸如项目响应理论（IRT）衍生的统计模型可以对学生对特定主题进行评估，这对于多种目的可能有用（例如，学习路径自定义，辍学预测）。但是，这些问题也必须进行评估，尽管有可能用IRT估算几个学生已经回答的问题的特征，但该技术不能用于新生成的问题。在本文中，我们提出了一个框架，以训练和评估模型，以通过从问题的文本和可能的选择中提取有意义的特征来估算新创建的多项选择问题的难度和歧视。我们使用此框架实施了一个模型，并在CloudAcademy提供的现实世界数据集上进行测试，这表明它的表现优于先前提出的模型，以减少6.7％的RMSE，以进行难度估计，而RMSE的歧视估计值则降低了10.8％。我们还介绍了一项消融研究的结果，以支持我们的特征选择，并展示问题文本对难度和歧视的不同特征的影响。

Statistical models such as those derived from Item Response Theory (IRT) enable the assessment of students on a specific subject, which can be useful for several purposes (e.g., learning path customization, drop-out prediction). However, the questions have to be assessed as well and, although it is possible to estimate with IRT the characteristics of questions that have already been answered by several students, this technique cannot be used on newly generated questions. In this paper, we propose a framework to train and evaluate models for estimating the difficulty and discrimination of newly created Multiple Choice Questions by extracting meaningful features from the text of the question and of the possible choices. We implement one model using this framework and test it on a real-world dataset provided by CloudAcademy, showing that it outperforms previously proposed models, reducing by 6.7% the RMSE for difficulty estimation and by 10.8% the RMSE for discrimination estimation. We also present the results of an ablation study performed to support our features choice and to show the effects of different characteristics of the questions' text on difficulty and discrimination.

下载PDF全文

下载文献需遵守相关版权规定

论文标题