抽象的公平性：甲骨文，指标和可解释性

论文标题

抽象的公平性：甲骨文，指标和可解释性

Abstracting Fairness: Oracles, Metrics, and Interpretability

论文作者

Dwork, Cynthia, Ilvento, Christina, Rothblum, Guy N., Sur, Pragya

论文摘要

众所周知，分类算法（例如，用于确定贷款申请的决定），不考虑上下文就无法评估公平性。我们研究了从公平性的甲骨文中学到的东西，该公平对``真实''公平有基本的理解。 Oracle作为输入a（上下文，分类器）对满足任意公平定义的（分类器对），并根据分类器是否满足潜在的公平真理接受或拒绝该对。我们的主要概念结果是一种学习基本真理的提取程序。此外，该过程可以学习对甲骨文弱形式的访问的近似值。由于每个``真正的公平''分类器都会诱发一个粗糙的度量，其中接受相同决定的人彼此之间的距离为零，而接受不同决策的人则处于距离为ONE，因此此提取过程为确保概括的度量公平形式提供了基础，也称为个人公平。我们的主要技术结果是，在较弱的甲骨文概念上，在轻度的技术限制下，提取器更高。我们的框架允许许多具有不同结果的分类器都被认为是公平的。我们的结果对解释性有影响 - 一种高度期望但定义较差的分类系统的特性，努力使人类仲裁者拒绝被认为是``不公平''或非法得出的分类者。

It is well understood that classification algorithms, for example, for deciding on loan applications, cannot be evaluated for fairness without taking context into account. We examine what can be learned from a fairness oracle equipped with an underlying understanding of ``true'' fairness. The oracle takes as input a (context, classifier) pair satisfying an arbitrary fairness definition, and accepts or rejects the pair according to whether the classifier satisfies the underlying fairness truth. Our principal conceptual result is an extraction procedure that learns the underlying truth; moreover, the procedure can learn an approximation to this truth given access to a weak form of the oracle. Since every ``truly fair'' classifier induces a coarse metric, in which those receiving the same decision are at distance zero from one another and those receiving different decisions are at distance one, this extraction process provides the basis for ensuring a rough form of metric fairness, also known as individual fairness. Our principal technical result is a higher fidelity extractor under a mild technical constraint on the weak oracle's conception of fairness. Our framework permits the scenario in which many classifiers, with differing outcomes, may all be considered fair. Our results have implications for interpretablity -- a highly desired but poorly defined property of classification systems that endeavors to permit a human arbiter to reject classifiers deemed to be ``unfair'' or illegitimately derived.

下载PDF全文

下载文献需遵守相关版权规定

论文标题