论文标题
与重叠组拉索的非重叠统计近似
The non-overlapping statistical approximation to overlapping group lasso
论文作者
论文摘要
套索是统计学习中常用的正则化方法,其中根据预定义的组从模型中消除参数。但是,当组重叠时,优化组套索的惩罚目标可能会在大规模问题上耗时,因为重叠组引起的不可分割性。这种瓶颈严重限制了在许多现代问题(例如基因途径选择和图形模型估计)中重叠的组套管正则化的应用。在本文中,我们提出了可分离的罚款,作为重叠组套件罚款的近似值。由于可分离性,基于我们的惩罚的正规化计算要比重叠的组套索要快得多,尤其是对于大规模和高维问题。我们表明,罚款是$ \ ell_ {q_1}/\ ell_ {q_2} $ norms的家族中重叠组套索规范的最紧密放松。此外,我们表明,基于提议的可分离惩罚的估计量在统计上等同于基于重叠的组套件对误差界的重叠组惩罚以及在平方损失下的速率优势性能。我们证明了方法与基于基因表达和多个基因途径的癌症肿瘤的分类问题相比,我们的方法的计算时间和统计等效性更快。
Group lasso is a commonly used regularization method in statistical learning in which parameters are eliminated from the model according to predefined groups. However, when the groups overlap, optimizing the group lasso penalized objective can be time-consuming on large-scale problems because of the non-separability induced by the overlapping groups. This bottleneck has seriously limited the application of overlapping group lasso regularization in many modern problems, such as gene pathway selection and graphical model estimation. In this paper, we propose a separable penalty as an approximation of the overlapping group lasso penalty. Thanks to the separability, the computation of regularization based on our penalty is substantially faster than that of the overlapping group lasso, especially for large-scale and high-dimensional problems. We show that the penalty is the tightest separable relaxation of the overlapping group lasso norm within the family of $\ell_{q_1}/\ell_{q_2}$ norms. Moreover, we show that the estimator based on the proposed separable penalty is statistically equivalent to the one based on the overlapping group lasso penalty with respect to their error bounds and the rate-optimal performance under the squared loss. We demonstrate the faster computational time and statistical equivalence of our method compared with the overlapping group lasso in simulation examples and a classification problem of cancer tumors based on gene expression and multiple gene pathways.