论文标题
稀疏CCA模型在变量选择中的分组效应
Grouping effects of sparse CCA models in variable selection
论文作者
论文摘要
稀疏的规范相关分析(SCCA)是一个生物群关联模型,它发现两组变量的稀疏线性组合彼此最大相关。除了标准的SCCA模型外,简化的SCCA准则,它可以在文献中广泛使用一对规范变量而不是其互相关的跨互相关,而不是其计算简单性。但是,这两个模型的解决方案的行为/特性在理论上仍然未知。在本文中,我们在可变选择中分析了标准和简化SCCA模型的分组效应。在高维设置中,这些变量通常形成具有较高组内相关性和较低组间相关性的组。我们的理论分析表明,对于分组的变量选择,简化的SCCA共同选择或取消选择一组变量,而标准SCCA则随机从每个相关的相关变量组中随机选择一些主要变量。合成数据和实际成像遗传学数据的经验结果验证了我们的理论分析的发现。
The sparse canonical correlation analysis (SCCA) is a bi-multivariate association model that finds sparse linear combinations of two sets of variables that are maximally correlated with each other. In addition to the standard SCCA model, a simplified SCCA criterion which maixmizes the cross-covariance between a pair of canonical variables instead of their cross-correlation, is widely used in the literature due to its computational simplicity. However, the behaviors/properties of the solutions of these two models remain unknown in theory. In this paper, we analyze the grouping effect of the standard and simplified SCCA models in variable selection. In high-dimensional settings, the variables often form groups with high within-group correlation and low between-group correlation. Our theoretical analysis shows that for grouped variable selection, the simplified SCCA jointly selects or deselects a group of variables together, while the standard SCCA randomly selects a few dominant variables from each relevant group of correlated variables. Empirical results on synthetic data and real imaging genetics data verify the finding of our theoretical analysis.