论文标题
与交替的K-均值
Biclustering with Alternating K-Means
论文作者
论文摘要
双簇是同时将数据矩阵的行和列共聚集到不同子组中的任务,从而使子组中的行和列显示出相似的模式。在本文中,我们考虑了产生块状双子群的情况。我们基于最大程度地降低经验聚类风险的想法来提供两次群集问题的新表述。我们在经验聚类风险方面发展并证明了一致性结果。由于优化问题本质上是组合的,因此找到全局最小值在计算上是可悲的。鉴于这一事实,我们提出了一种简单新颖的算法,该算法通过交替使用列之间的K-Means聚类算法的适用版本来找到局部最低限度。我们评估并将算法的性能与模拟数据和现实基因表达数据集的其他相关的双簇方法进行比较。结果表明,我们的算法能够检测到数据中有意义的结构,并且在各种环境和情况下都超过其他竞争性的双簇方法。
Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-diagonal biclusters. We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk. We develop and prove a consistency result with respect to the empirical clustering risk. Since the optimization problem is combinatorial in nature, finding the global minimum is computationally intractable. In light of this fact, we propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows. We evaluate and compare the performance of our algorithm to other related biclustering methods on both simulated data and real-world gene expression data sets. The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.