论文标题
稀疏的半参数判别分析,用于高维零膨胀数据
Sparse semiparametric discriminant analysis for high-dimensional zero-inflated data
论文作者
论文摘要
基于测序的技术提供了大量的高维生物数据集,并具有偏斜和偏辐的测量值。通过线性判别分析对此类数据进行分类导致由于违反高斯分布假设的侵犯而导致性能差。为了解决这一限制,我们提出了一个新的半参数判别分析框架,该框架基于截断的潜在高斯模型模型,该模型既适合偏度和零通胀。通过应用稀疏性正则化,我们证明了所提出的方法导致在高维设置中对分类方向的一致估计。在模拟数据上,与现有方法相比,提出的方法显示出卓越的性能。我们采用该方法根据微生物组数据区分克罗恩病患者的健康对照,并确定对分类规则影响最大的属。
Sequencing-based technologies provide an abundance of high-dimensional biological datasets with skewed and zero-inflated measurements. Classification of such data with linear discriminant analysis leads to poor performance due to the violation of the Gaussian distribution assumption. To address this limitation, we propose a new semiparametric discriminant analysis framework based on the truncated latent Gaussian copula model that accommodates both skewness and zero inflation. By applying sparsity regularization, we demonstrate that the proposed method leads to the consistent estimation of classification direction in high-dimensional settings. On simulated data, the proposed method shows superior performance compared to the existing method. We apply the method to discriminate healthy controls from patients with Crohn's disease based on microbiome data and to identify genera with the most influence on the classification rule.