论文标题
在教育评估中使用ELO评级作为比较判断的度量
Using Elo Rating as a Metric for Comparative Judgement in Educational Assessment
论文作者
论文摘要
在绝大多数教育环境和环境中,标记和反馈是教学和学习的基本特征。但是,教师可以花费大量时间和精力来标记评估,并向学生提供有用的反馈。此外,它还为评估者带来了重大的认知负担,尤其是在确保公平和公平性方面。因此,在教育空间中提出了一种称为比较判断(CJ)的替代方法。受比较判断法(LCJ)的启发。然后可以使用尽可能多的成对的对比较来对所有提交进行排名。研究表明,CJ在为老师提供快速的同时非常可靠和准确。替代研究质疑这一主张表明该过程可能会增加结果的偏见,因为同一提交向评估者显示了可靠性的多次。此外,研究还发现,CJ可能会导致整体标记过程比更传统的标记方法更长,因为必须收集有关许多对的信息。 在本文中,我们调查了ELO,它已被广泛用于零和国际象棋等零和游戏中的玩家。我们在一个大规模的Twitter数据集上进行了实验,讨论了最近的一次重大政治活动(英国脱欧,从欧盟退出),以询问用户在从十个推文中选出的一对中发现哪种推文更有趣。我们对数据的分析表明,ELO等级在统计学上与CJ排名显着相似,Kendall的TAU得分为0.96,P值为1.5x10^(-5)。我们完成了有关这种方法在各种教育环境中的潜在更广泛应用的明智讨论。
Marking and feedback are essential features of teaching and learning, across the overwhelming majority of educational settings and contexts. However, it can take a great deal of time and effort for teachers to mark assessments, and to provide useful feedback to the students. Furthermore, it also creates a significant cognitive load on the assessors, especially in ensuring fairness and equity. Therefore, an alternative approach to marking called comparative judgement (CJ) has been proposed in the educational space. Inspired by the law of comparative judgment (LCJ). This pairwise comparison for as many pairs as possible can then be used to rank all submissions. Studies suggest that CJ is highly reliable and accurate while making it quick for the teachers. Alternative studies have questioned this claim suggesting that the process can increase bias in the results as the same submission is shown many times to an assessor for increasing reliability. Additionally, studies have also found that CJ can result in the overall marking process taking longer than a more traditional method of marking as information about many pairs must be collected. In this paper, we investigate Elo, which has been extensively used in rating players in zero-sum games such as chess. We experimented on a large-scale Twitter dataset on the topic of a recent major UK political event ("Brexit", the UK's political exit from the European Union) to ask users which tweet they found funnier between a pair selected from ten tweets. Our analysis of the data reveals that the Elo rating is statistically significantly similar to the CJ ranking with a Kendall's tau score of 0.96 and a p-value of 1.5x10^(-5). We finish with an informed discussion regarding the potential wider application of this approach to a range of educational contexts.