具有数据自适应阈值的无模型，单调不变和计算高效的特征筛选

论文标题

具有数据自适应阈值的无模型，单调不变和计算高效的特征筛选

Model-Free, Monotone Invariant and Computationally Efficient Feature Screening with Data-adaptive Threshold

论文作者

Deng, Linsui, Zhang, Yilin

论文摘要

通常，超高维度的特征筛选以两个基本步骤进行。第一步是测量和排名响应和协变量之间的边际依赖性，第二步是确定阈值。我们制定了一个新的筛选程序，称为Sitby Process，该程序在这两个步骤中都具有吸引人的统计属性。通过在测量和排名阶段使用切片的独立性估计值，我们提出的程序不需要模型假设，单调转换仍然不变，并且达到了几乎线性计算的复杂性。受到虚假发现率（FDR）控制程序的启发，我们提供了来自测试统计的渐近正态性的数据自适应阈值。在中等条件下，我们证明我们的程序可以在维护确定的筛选属性的同时渐近地控制FDR。我们通过广泛的模拟和全基因组数据集的应用来研究我们提出的程序的有限样本性能。

Feature screening for ultrahigh-dimension, in general, proceeds with two essential steps. The first step is measuring and ranking the marginal dependence between response and covariates, and the second is determining the threshold. We develop a new screening procedure, called SIT-BY procedure, that possesses appealing statistical properties in both steps. By employing sliced independence estimates in the measuring and ranking stage, our proposed procedure requires no model assumptions, remains invariant to monotone transformation, and achieves almost linear computation complexity. Inspired by false discovery rate (FDR) control procedures, we offer a data-adaptive threshold benefit from the asymptotic normality of test statistics. Under moderate conditions, we demonstrate that our procedure can asymptotically control the FDR while maintaining the sure screening property. We investigate the finite sample performance of our proposed procedure via extensive simulations and an application to genome-wide dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题