论文标题
支持稀疏PCA中的恢复,但数据不完整
Support Recovery in Sparse PCA with Incomplete Data
论文作者
论文摘要
我们研究了不完整和嘈杂数据的稀疏主成分分析(PCA)的实用算法。我们的算法基于非convex $ l_1 $ regultarized pca问题的半决赛程序(SDP)放松。我们提供了理论和实验证据,使我们能够准确地恢复未知真基质的稀疏领先特征向量的真实支持,尽管仅观察到不完整(随机均匀地丢失)和嘈杂的版本。我们得出了足够的条件以进行精确恢复,涉及矩阵不连贯,最大和第二大特征值之间的光谱差距,观察概率和噪声方差。我们通过不完整的合成数据来验证理论结果,并在基因表达数据集上显示出令人鼓舞和有意义的结果。
We study a practical algorithm for sparse principal component analysis (PCA) of incomplete and noisy data. Our algorithm is based on the semidefinite program (SDP) relaxation of the non-convex $l_1$-regularized PCA problem. We provide theoretical and experimental evidence that SDP enables us to exactly recover the true support of the sparse leading eigenvector of the unknown true matrix, despite only observing an incomplete (missing uniformly at random) and noisy version of it. We derive sufficient conditions for exact recovery, which involve matrix incoherence, the spectral gap between the largest and second-largest eigenvalues, the observation probability and the noise variance. We validate our theoretical results with incomplete synthetic data, and show encouraging and meaningful results on a gene expression dataset.