测验设计任务：帮助教师创建带有自动化问题的测验

论文标题

测验设计任务：帮助教师创建带有自动化问题的测验

Quiz Design Task: Helping Teachers Create Quizzes with Automated Question Generation

论文作者

Laban, Philippe, Wu, Chien-Sheng, Murakhovs'ka, Lidiya, Liu, Wenhao, Xiong, Caiming

论文摘要

问题产生（QGEN）模型通常通过基于N-Gram重叠的标准化NLG指标进行评估。在本文中，我们衡量这些指标改进是否转化为实用环境中的收益，重点是帮助教师自动化阅读理解测验的产生。在我们的研究中，建立测验的教师会收到问题建议，他们可以以某种原因接受或拒绝。即使我们发现QGEN的最新进展导致问题接受率的显着提高，但仍有很大的改进空间，最佳模型只有68.4％的问题被参与我们研究的十个老师接受。然后，我们利用我们收集的注释来分析标准NLG指标，并发现模型性能已达到预计的上限，这表明需要新的自动指标来指导QGEN研究。

Question generation (QGen) models are often evaluated with standardized NLG metrics that are based on n-gram overlap. In this paper, we measure whether these metric improvements translate to gains in a practical setting, focusing on the use case of helping teachers automate the generation of reading comprehension quizzes. In our study, teachers building a quiz receive question suggestions, which they can either accept or refuse with a reason. Even though we find that recent progress in QGen leads to a significant increase in question acceptance rates, there is still large room for improvement, with the best model having only 68.4% of its questions accepted by the ten teachers who participated in our study. We then leverage the annotations we collected to analyze standard NLG metrics and find that model performance has reached projected upper-bounds, suggesting new automatic metrics are needed to guide QGen research forward.

下载PDF全文

下载文献需遵守相关版权规定

论文标题