论文标题
通过M回归对高维数据的平行亚组分析
Parallel subgroup analysis of high-dimensional data via M-regression
论文作者
论文摘要
确定数据分析中的亚组结构是一个有趣的问题,因为在实践中种群可能是异质的。在本文中,我们将M估计器以及凹形和成对的融合惩罚一起考虑,可以处理包含某些异常值的高维数据。惩罚既适用于协变量和治疗效应,在这种效果上,预计估计将同时实现可变选择和数据聚类。提出了一种算法来基于并行计算处理相对较大的数据集。我们建立了拟议算法的收敛分析,受惩罚的M估计器的甲骨文特性以及所提出的标准的选择一致性。我们的数值研究表明,所提出的方法有望有效地识别隐藏在高维数据中的亚组。
It becomes an interesting problem to identify subgroup structures in data analysis as populations are probably heterogeneous in practice. In this paper, we consider M-estimators together with both concave and pairwise fusion penalties, which can deal with high-dimensional data containing some outliers. The penalties are applied both on covariates and treatment effects, where the estimation is expected to achieve both variable selection and data clustering simultaneously. An algorithm is proposed to process relatively large datasets based on parallel computing. We establish the convergence analysis of the proposed algorithm, the oracle property of the penalized M-estimators, and the selection consistency of the proposed criterion. Our numerical study demonstrates that the proposed method is promising to efficiently identify subgroups hidden in high-dimensional data.