通过概率建模和潜在的公平决策来进行群体公平

论文标题

通过概率建模和潜在的公平决策来进行群体公平

Group Fairness by Probabilistic Modeling with Latent Fair Decisions

论文作者

Choi, YooJung, Dang, Meihua, Broeck, Guy Van den

论文摘要

机器学习系统越来越多地用于做出有影响力的决定，例如贷款申请和刑事司法风险评估，因此，确保这些系统的公平性至关重要。由于数据中的标签有偏见，这通常是具有挑战性的。本文通过对代表隐藏的，无偏的标签的潜在变量进行明确建模，从偏见的数据中学习公平的概率分布。特别是，我们旨在通过在学习模型中执行某些独立性来实现人口平等。我们还表明，只有在提供这些保证的分布确实捕获现实世界数据的情况下，群体公平保证才有意义。为了密切建模数据分布，我们采用了概率电路，这是一种表达性且可处理的概率模型，并提出了一种算法来从不完整的数据中学习它们。我们在合成数据集上评估了我们的方法，其中观察到的标签确实来自公平标签，但具有偏见，并证明了公平标签已成功检索。此外，我们在现实世界数据集上显示，我们的方法不仅比现有数据的生成方式更好，而且还可以实现竞争精度。

Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved. Moreover, we show on real-world datasets that our approach not only is a better model than existing methods of how the data was generated but also achieves competitive accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题