论文标题
可扩展的光谱聚类和群体公平约束
Scalable Spectral Clustering with Group Fairness Constraints
论文作者
论文摘要
在建模公平性和纠正机器学习中的算法偏见方面,有许多研究兴趣和工业努力的协同作用。在本文中,我们提出了一种具有群体公平约束的光谱聚类(SC)的可扩展算法。群体公平也称为统计奇偶校验,在每个集群中,每个受保护的组的比例与整体相同。尽管Fairsc算法(Kleindessner等人,2019年)能够找到更公平的聚类,但由于计算NullSpaces的内核和密集矩阵的正方形根部,它被高成本损害。我们通过结合零扎的投影和Hotelling的放气提出了基本光谱计算的新公式,使得所得算法称为S-FairSC,仅涉及稀疏的矩阵矢量产物,并能够完全利用公平SC模型的稀疏性。修改的随机块模型的实验结果表明,S-Fairsc与Fairsc恢复了公平聚类相当。同时,对于中等模型大小,它的加速为12倍。从S-FairsC的意义上说,S-Fairsc进一步证明是可扩展的,即S-FairsC的计算成本仅与SC相比而没有公平限制。
There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entirety. While FairSC algorithm (Kleindessner et al., 2019) is able to find the fairer clustering, it is compromised by high costs due to the kernels of computing nullspaces and the square roots of dense matrices explicitly. We present a new formulation of underlying spectral computation by incorporating nullspace projection and Hotelling's deflation such that the resulting algorithm, called s-FairSC, only involves the sparse matrix-vector products and is able to fully exploit the sparsity of the fair SC model. The experimental results on the modified stochastic block model demonstrate that s-FairSC is comparable with FairSC in recovering fair clustering. Meanwhile, it is sped up by a factor of 12 for moderate model sizes. s-FairSC is further demonstrated to be scalable in the sense that the computational costs of s-FairSC only increase marginally compared to the SC without fairness constraints.