论文标题

通过合奏的快速多视图聚类:朝着可伸缩性,优越性和简单性

Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity

论文作者

Huang, Dong, Wang, Chang-Dong, Lai, Jian-Huang

论文摘要

尽管取得了重大进展,但先前的多视图聚类算法仍存在三个局限性。首先,他们经常遭受高计算复杂性的困扰,从而限制了它们对大规模数据集的可行性。其次,它们通常通过单级融合融合多视图信息,从而忽略了多阶段融合中的可能性。第三,经常需要进行数据集特异性的高参数调整,进一步破坏了它们的实用性。鉴于此,我们通过合奏(FastMice)方法提出了快速的多视图聚类。特别是,提出了随机视图组的概念,以捕获多功能视图的关系,通过该关系,混合的早期融合策略旨在实现有效的多阶段融合。由于多种视图扩展到许多视图组,分别多样性(分别为W.R.T.特征,锚和邻居)共同利用在早期融合中构建观点共享的两部分图。然后,通过快速图形分区获得了一组不同视图组的多样化的基本聚类,这些群体将进一步配制为统一的双分部分图,以在后期融合中进行最终聚类。值得注意的是,FastMice几乎具有线性时间和空间的复杂性,并且没有特定于数据集的调整。在22个多视图数据集上的实验证明了其在可扩展性(对于极大的数据集),优越性(集群性能)和简单性(待应用)上的优势。可用的代码:https://github.com/huangdonghere/fastmice。

Despite significant progress, there remain three limitations to the previous multi-view clustering algorithms. First, they often suffer from high computational complexity, restricting their feasibility for large-scale datasets. Second, they typically fuse multi-view information via one-stage fusion, neglecting the possibilities in multi-stage fusions. Third, dataset-specific hyperparameter-tuning is frequently required, further undermining their practicability. In light of this, we propose a fast multi-view clustering via ensembles (FastMICE) approach. Particularly, the concept of random view groups is presented to capture the versatile view-wise relationships, through which the hybrid early-late fusion strategy is designed to enable efficient multi-stage fusions. With multiple views extended to many view groups, three levels of diversity (w.r.t. features, anchors, and neighbors, respectively) are jointly leveraged for constructing the view-sharing bipartite graphs in the early-stage fusion. Then, a set of diversified base clusterings for different view groups are obtained via fast graph partitioning, which are further formulated into a unified bipartite graph for final clustering in the late-stage fusion. Notably, FastMICE has almost linear time and space complexity, and is free of dataset-specific tuning. Experiments on 22 multi-view datasets demonstrate its advantages in scalability (for extremely large datasets), superiority (in clustering performance), and simplicity (to be applied) over the state-of-the-art. Code available: https://github.com/huangdonghere/FastMICE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源