通过合奏的快速多视图聚类：朝着可伸缩性，优越性和简单性

论文标题

通过合奏的快速多视图聚类：朝着可伸缩性，优越性和简单性

Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity

论文作者

Huang, Dong, Wang, Chang-Dong, Lai, Jian-Huang

论文摘要

尽管取得了重大进展，但先前的多视图聚类算法仍存在三个局限性。首先，他们经常遭受高计算复杂性的困扰，从而限制了它们对大规模数据集的可行性。其次，它们通常通过单级融合融合多视图信息，从而忽略了多阶段融合中的可能性。第三，经常需要进行数据集特异性的高参数调整，进一步破坏了它们的实用性。鉴于此，我们通过合奏（FastMice）方法提出了快速的多视图聚类。特别是，提出了随机视图组的概念，以捕获多功能视图的关系，通过该关系，混合的早期融合策略旨在实现有效的多阶段融合。由于多种视图扩展到许多视图组，分别多样性（分别为W.R.T.特征，锚和邻居）共同利用在早期融合中构建观点共享的两部分图。然后，通过快速图形分区获得了一组不同视图组的多样化的基本聚类，这些群体将进一步配制为统一的双分部分图，以在后期融合中进行最终聚类。值得注意的是，FastMice几乎具有线性时间和空间的复杂性，并且没有特定于数据集的调整。在22个多视图数据集上的实验证明了其在可扩展性（对于极大的数据集），优越性（集群性能）和简单性（待应用）上的优势。可用的代码：https：//github.com/huangdonghere/fastmice。

Despite significant progress, there remain three limitations to the previous multi-view clustering algorithms. First, they often suffer from high computational complexity, restricting their feasibility for large-scale datasets. Second, they typically fuse multi-view information via one-stage fusion, neglecting the possibilities in multi-stage fusions. Third, dataset-specific hyperparameter-tuning is frequently required, further undermining their practicability. In light of this, we propose a fast multi-view clustering via ensembles (FastMICE) approach. Particularly, the concept of random view groups is presented to capture the versatile view-wise relationships, through which the hybrid early-late fusion strategy is designed to enable efficient multi-stage fusions. With multiple views extended to many view groups, three levels of diversity (w.r.t. features, anchors, and neighbors, respectively) are jointly leveraged for constructing the view-sharing bipartite graphs in the early-stage fusion. Then, a set of diversified base clusterings for different view groups are obtained via fast graph partitioning, which are further formulated into a unified bipartite graph for final clustering in the late-stage fusion. Notably, FastMICE has almost linear time and space complexity, and is free of dataset-specific tuning. Experiments on 22 multi-view datasets demonstrate its advantages in scalability (for extremely large datasets), superiority (in clustering performance), and simplicity (to be applied) over the state-of-the-art. Code available: https://github.com/huangdonghere/FastMICE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题