论文标题
使用模板量化社交偏见是不可靠的
Quantifying Social Biases Using Templates is Unreliable
论文作者
论文摘要
最近,人们努力了解大型语言模型(LLM)如何传播和扩大社会偏见。几项工作利用模板进行公平评估,这使研究人员能够在没有受保护属性标签的测试集的情况下量化社交偏见。虽然模板评估可能是了解模型缺陷的方便且有用的诊断工具,但它通常使用简单且有限的模板集。在本文中,我们研究了偏置测量是否对用于基准测试的模板的选择敏感。具体而言,我们通过手动修改先前作品中提出的模板以语义提供方式来研究偏差测量的不稳定性,并在这些修改中测量偏差。我们发现,对四个任务的模板修改,偏差值和结果结论差异很大,从降低81%(NLI)到(特定于任务)偏置测量值的增加到162%(MLM)。我们的结果表明,在当前实践中,量化LLM中的公平性可能是脆弱的,需要更加谨慎和谨慎。
Recently, there has been an increase in efforts to understand how large language models (LLMs) propagate and amplify social biases. Several works have utilized templates for fairness evaluation, which allow researchers to quantify social biases in the absence of test sets with protected attribute labels. While template evaluation can be a convenient and helpful diagnostic tool to understand model deficiencies, it often uses a simplistic and limited set of templates. In this paper, we study whether bias measurements are sensitive to the choice of templates used for benchmarking. Specifically, we investigate the instability of bias measurements by manually modifying templates proposed in previous works in a semantically-preserving manner and measuring bias across these modifications. We find that bias values and resulting conclusions vary considerably across template modifications on four tasks, ranging from an 81% reduction (NLI) to a 162% increase (MLM) in (task-specific) bias measurements. Our results indicate that quantifying fairness in LLMs, as done in current practice, can be brittle and needs to be approached with more care and caution.