论文标题
级联偏见:研究启发式注释策略对数据和模型的影响
Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models
论文作者
论文摘要
认知心理学家已经证明,人类使用认知启发式方法或心理捷径来快速决定,同时减少努力。在众包平台上执行注释工作时,我们假设在注释器级联中使用这种启发式方法,以实现数据质量和模型稳健性。在这项工作中,我们在注释多项选择阅读理解数据集的背景下研究认知启发式使用。我们提出了跟踪注释启发式痕迹,在其中我们有明显地测量低劳动注释策略,这些策略可以表明使用各种认知启发式方法。我们发现证据表明,注释者可能基于与一系列心理测试的相关性使用多种这种启发式方法。重要的是,注释中的启发式用途决定了沿几个维度的数据质量:(1)已知的有偏见模型,例如部分输入模型,更容易地求解由注释者高度评分的注释示例,(2)在注释方面训练在注释方面训练的模型在启发式方面也不在启发式方面得分,也不是一般性的,(3)具有挑战性的挑战,而不是创造有力的象征。我们的发现表明,在注释者中跟踪启发式用法可能有可能帮助收集具有挑战性的数据集并诊断模型偏见。
Cognitive psychologists have documented that humans use cognitive heuristics, or mental shortcuts, to make quick decisions while expending less effort. While performing annotation work on crowdsourcing platforms, we hypothesize that such heuristic use among annotators cascades on to data quality and model robustness. In this work, we study cognitive heuristic use in the context of annotating multiple-choice reading comprehension datasets. We propose tracking annotator heuristic traces, where we tangibly measure low-effort annotation strategies that could indicate usage of various cognitive heuristics. We find evidence that annotators might be using multiple such heuristics, based on correlations with a battery of psychological tests. Importantly, heuristic use among annotators determines data quality along several dimensions: (1) known biased models, such as partial input models, more easily solve examples authored by annotators that rate highly on heuristic use, (2) models trained on annotators scoring highly on heuristic use don't generalize as well, and (3) heuristic-seeking annotators tend to create qualitatively less challenging examples. Our findings suggest that tracking heuristic usage among annotators can potentially help with collecting challenging datasets and diagnosing model biases.