论文标题
个人公平的概念
A Notion of Individual Fairness for Clustering
论文作者
论文摘要
公平的机器学习中的一个共同区别,尤其是在公平分类中,是群体公平与个人公平之间。在聚类的背景下,近年来对群体公平进行了广泛的研究。但是,几乎没有探索个人聚类的公平性。在本文中,我们提出了一个自然的个人公平概念。我们的概念要求,平均而言,每个数据点都更接近其自身集群中的点,而不是其他群集中的点。我们研究了与我们提出的个人公平概念有关的几个问题。在负面方面,我们表明,确定给定数据集通常允许使用这种单独的公平聚类是NP-HARD。从积极的一面来看,对于位于实际线路上的数据集的特殊情况,我们提出了一种有效的动态编程方法来找到单独的公平聚类。对于一般数据集,我们研究了旨在最大程度地减少个人公平违规数量的启发式方法,并将其与实际数据集的标准聚类方法进行比较。
A common distinction in fair machine learning, in particular in fair classification, is between group fairness and individual fairness. In the context of clustering, group fairness has been studied extensively in recent years; however, individual fairness for clustering has hardly been explored. In this paper, we propose a natural notion of individual fairness for clustering. Our notion asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. We study several questions related to our proposed notion of individual fairness. On the negative side, we show that deciding whether a given data set allows for such an individually fair clustering in general is NP-hard. On the positive side, for the special case of a data set lying on the real line, we propose an efficient dynamic programming approach to find an individually fair clustering. For general data sets, we investigate heuristics aimed at minimizing the number of individual fairness violations and compare them to standard clustering approaches on real data sets.