论文标题
对Hooge等人(2018)使用的证据的重新检查“经验丰富的未经训练的观察者的分类是固定检测中的黄金标准吗?”
A Re-Examination of the Evidence used by Hooge et al (2018) "Is human classification by experienced untrained observers a gold standard in fixation detection?"
论文作者
论文摘要
Hooge等。被问及一个问题:“经验丰富的未经训练的观察者的人类分类是固定检测的金标准吗?”他们得出的结论是否定的。如果他们有资格的论文:“当数据质量非常差,数据误差,数据显示不是最佳的,并且分析严重缺陷时,经验丰富的未经训练的观察者的人类分类是固定检测的金标准吗?”在本报告中,我将提供证据,以支持我的观点,即后一个标题是合理的。低质量的数据评估是基于使用相对不精确的眼线笔,任何受试者没有头部约束,并且使用婴儿作为大多数受试者(70名受试者中有60名)。允许超过50%的数据(多达95%)的受试者也证明了低质量数据的证据。充满错误的评估是基于证据表明,许多由“专家”分类的“固定”内部有明显的扫视,显然,基于无信号对许多固定进行了分类。非最佳数据呈现的证据源于以下事实:在许多情况下,没有向编码人员提供完美的数据。分析中的缺陷证明了以下事实:整个丢失的数据被认为是分类的,并且扫视幅度的测量基于许多根本没有扫视的情况。如果没有相反的一般证据,可以正确地假设某些人类分类器在某些条件下可能符合黄金标准的标准,而在其他条件下的分类器可能不可能。 Hooge等人无法认识到这种条件。公平的评估将得出结论,是否可以将人类视为黄金标准仍然是一个悬而未决的问题。
Hooge et al. asked the question: "Is human classification by experienced untrained observers a gold standard in fixation detection?" They conclude the answer is no. If they had entitled their paper: "Is human classification by experienced untrained observers a gold standard in fixation detection when data quality is very poor, data are error-filled, data presentation was not optimal, and the analysis was seriously flawed?", I would have no case to make. In the present report, I will present evidence to support my view that this latter title is justified. The low quality data assessment is based on using a relatively imprecise eye-tracker, the absence of head restraint for any subjects, and the use of infants as the majority of subjects (60 of 70 subjects). Allowing subjects with more than 50% missing data (as much as 95%) is also evidence of low quality data. The error-filled assessment is based on evidence that a number of the "fixations" classified by "experts" have obvious saccades within them, and that, apparently, a number of fixations were classified on the basis of no signal at all. The evidence for non-optimal data presentation stems from the fact that, in a number of cases, perfectly good data was not presented to the coders. The flaws in the analysis are evidenced by the fact that entire stretches of missing data were considered classified, and that the measurement of saccade amplitude was based on many cases in which there was no saccade at all. Without general evidence to the contrary, it is correct to assume that some human classifiers under some conditions may meet the criteria for a gold standard, and classifiers under other conditions may not. This conditionality is not recognized by Hooge et al. A fair assessment would conclude that whether or not humans can be considered a gold standard is still very much an open question.