评估面部识别算法的建议的公平模型

论文标题

评估面部识别算法的建议的公平模型

Evaluating Proposed Fairness Models for Face Recognition Algorithms

论文作者

Howard, John J., Laird, Eli J., Sirotin, Yevgeniy B., Rubin, Rebecca E., Tipton, Jerry L., Vemury, Arun R.

论文摘要

由于深度学习的开始以及培训数据的广泛可用性，学术和商业组织的面部识别算法的发展正在迅速发展。尽管对面部识别算法性能的测试表明每年的性能增长，但许多系统的错误率根据测试集的人口统计组成而有所不同。这些算法性能中的“人口差异”可能会导致某些人群不平等或不公平的结果，从而引起全球采用面部识别系统的关注。因此，美国和欧洲的监管机构都提出了新的规则，要求对生物识别系统进行“歧视性影响”（欧盟人工智能法）和“公平”（美国联邦贸易委员会）进行审核。但是，尚无测量生物识别系统公平性的标准。本文介绍了美国和欧洲科学家的两种拟议的面部识别算法公平度（公平措施）。我们发现，当应用于分解面部识别错误率时，这两种建议的方法都具有挑战性，因为它们在实践中通常是经验丰富的。为了解决这个问题，我们提出了一组可解释性标准，称为功能公平度量标准（FFMC），概述了面部识别算法公平度量中所需的一组属性。我们进一步制定了一种新的公平度量，即生物识别公平性（GARBE）的GINI聚集率，并展示如何与帕累托优化结合使用，该措施可用于基于准确性/公平贸易空间中的替代算法中选择。最后，我们已经开源了机器可读，人口统计学分解的错误率的数据集。我们认为这是目前此类最大的开源数据集。

The development of face recognition algorithms by academic and commercial organizations is growing rapidly due to the onset of deep learning and the widespread availability of training data. Though tests of face recognition algorithm performance indicate yearly performance gains, error rates for many of these systems differ based on the demographic composition of the test set. These "demographic differentials" in algorithm performance can contribute to unequal or unfair outcomes for certain groups of people, raising concerns with increased worldwide adoption of face recognition systems. Consequently, regulatory bodies in both the United States and Europe have proposed new rules requiring audits of biometric systems for "discriminatory impacts" (European Union Artificial Intelligence Act) and "fairness" (U.S. Federal Trade Commission). However, no standard for measuring fairness in biometric systems yet exists. This paper characterizes two proposed measures of face recognition algorithm fairness (fairness measures) from scientists in the U.S. and Europe. We find that both proposed methods are challenging to interpret when applied to disaggregated face recognition error rates as they are commonly experienced in practice. To address this, we propose a set of interpretability criteria, termed the Functional Fairness Measure Criteria (FFMC), that outlines a set of properties desirable in a face recognition algorithm fairness measure. We further develop a new fairness measure, the Gini Aggregation Rate for Biometric Equitability (GARBE), and show how, in conjunction with the Pareto optimization, this measure can be used to select among alternative algorithms based on the accuracy/fairness trade-space. Finally, we have open-sourced our dataset of machine-readable, demographically disaggregated error rates. We believe this is currently the largest open-source dataset of its kind.

下载PDF全文

下载文献需遵守相关版权规定

论文标题