论文标题
Inv-Sennet:在偏置数据下用于聚类的不变自我表达网络
Inv-SENnet: Invariant Self Expression Network for clustering under biased data
论文作者
论文摘要
子空间聚类算法用于理解很好地解释数据集的群集结构。这些方法广泛用于自然科学各个领域的数据探索任务。但是,这些方法中的大多数无法处理数据集中不必要的偏差。对于数据样本表示多个属性的数据集,天真地应用任何聚类方法可能会导致不希望的输出。为此,我们提出了一个新颖的框架,以共同消除不需要的属性(偏见),同时学习将数据点集中在各个子空间中。假设我们有有关偏见的信息,我们通过对抗性学习将数据和不需要属性之间的相互信息最小化来正规化聚类方法。我们对合成和现实世界数据集的实验结果证明了我们方法的有效性。
Subspace clustering algorithms are used for understanding the cluster structure that explains the dataset well. These methods are extensively used for data-exploration tasks in various areas of Natural Sciences. However, most of these methods fail to handle unwanted biases in datasets. For datasets where a data sample represents multiple attributes, naively applying any clustering approach can result in undesired output. To this end, we propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces. Assuming we have information about the bias, we regularize the clustering method by adversarially learning to minimize the mutual information between the data and the unwanted attributes. Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.