大型基质稀疏分解的统计学指导性分裂和争议

论文标题

大型基质稀疏分解的统计学指导性分裂和争议

Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

论文作者

Chen, Kun, Dong, Ruipeng, Xu, Wanwan, Zheng, Zemin

论文摘要

大型矩阵的稀疏分解在现代统计学习中至关重要。特别是，稀疏的奇异值分解及其变体已用于多变量回归，因子分析，双簇，矢量时间序列建模等。这种分解的吸引力是由于其在样本和变量之间或响应和预测因子之间发现高度解释的潜在关联网络的力量。但是，许多现有的方法要么是没有一般绩效保证的临时方法，要么是计算密集的，因此它们不适合大规模研究。我们将统计问题作为稀疏的因子回归，并通过分裂和纠纷方法对其进行解决。在分裂的第一阶段，我们考虑了将任务简化为一组共同组合单位估计（治疗）问题的顺序和平行方法，并建立了这些常见的且尚未理解的缩水方法的统计基础。在分裂的第二阶段，我们创新了一种竞争的阶段学习技术，该技术由一系列简单的增量更新组成，以有效地删除整个解决方案的治疗路径。我们的算法的计算复杂性要比交替的凸搜索要低得多，并且步骤尺寸的选择可以使统计准确性和计算效率之间的灵活和原则性的权衡。我们的工作是最早为非凸问题进行舞台学习的工作之一，并且这个想法可以适用于许多多凸问题。广泛的仿真研究和遗传学的应用证明了我们方法的有效性和可扩展性。

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题