基于社会公平的中心和线性子空间集群

论文标题

基于社会公平的中心和线性子空间集群

Socially Fair Center-based and Linear Subspace Clustering

论文作者

Gorantla, Sruthi, Gowda, Kishen N., Deshpande, Amit, Louis, Anand

论文摘要

基于中心的聚类（例如，$ k $ -Means，$ k $ -Medians）和使用线性子空间的聚类是两种最流行的技术，可以将真实数据分配到较小的群集中。但是，当数据由敏感人群组组成时，不同敏感组的每点的聚集成本显着不同，可能会导致与公平相关的危害（例如，服务质量不同）。社会公平聚类的目的是最大程度地降低所有组中每点聚类的最大成本。在这项工作中，我们提出了一个统一的框架，以解决社会公平的基于中心的聚类和线性子空间聚类，并为这些问题提供实用，高效的近似算法。我们进行了广泛的实验，以表明在多个基准数据集上，我们的算法要么非常匹配或胜过最先进的基线。

Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题