高维线性回归的插补

论文标题

高维线性回归的插补

Imputation for High-Dimensional Linear Regression

论文作者

Chandrasekher, Kabir Aladin, Alaoui, Ahmed El, Montanari, Andrea

论文摘要

我们研究高维回归，在协变量中缺少条目。实践中的一种常见策略是\ emph {估算}丢失的条目具有适当的替代品，然后执行标准统计程序，就像完全观察到协变量一样。有关该主题的最新文献建议设计一种针对缺失协变量的情况下量身定制的特定，通常是复杂或非凸的算法。我们研究了一种更简单的方法，鉴于观察到的协变量，我们填写缺失条目的条件平均值。我们表明，这种插补方案加上标准的现成程序，例如Lasso和Square-root Lasso保留在协变量为I.I.D. \ sub-Gaussian的随机设计设置中的最小值估计率。我们进一步表明，在此设置中，方形拉索仍然是\ emph {pivotal}。通常无法准确计算条件期望，并且必须从数据中近似。我们研究了两种情况，即协变量要么遵循自回归（AR）过程，要么是具有稀疏精度矩阵的共同高斯。我们提出了有条件期望的可拖动估计量，然后通过Lasso进行线性回归，并在两种情况下显示出相似的估计率。我们通过模拟合成和半合成示例的模拟来补充我们的理论结果，这不仅说明了我们边界的清晰度，而且说明了该策略的更广泛的实用性，而不是我们的理论假设。

We study high-dimensional regression with missing entries in the covariates. A common strategy in practice is to \emph{impute} the missing entries with an appropriate substitute and then implement a standard statistical procedure acting as if the covariates were fully observed. Recent literature on this subject proposes instead to design a specific, often complicated or non-convex, algorithm tailored to the case of missing covariates. We investigate a simpler approach where we fill-in the missing entries with their conditional mean given the observed covariates. We show that this imputation scheme coupled with standard off-the-shelf procedures such as the LASSO and square-root LASSO retains the minimax estimation rate in the random-design setting where the covariates are i.i.d.\ sub-Gaussian. We further show that the square-root LASSO remains \emph{pivotal} in this setting. It is often the case that the conditional expectation cannot be computed exactly and must be approximated from data. We study two cases where the covariates either follow an autoregressive (AR) process, or are jointly Gaussian with sparse precision matrix. We propose tractable estimators for the conditional expectation and then perform linear regression via LASSO, and show similar estimation rates in both cases. We complement our theoretical results with simulations on synthetic and semi-synthetic examples, illustrating not only the sharpness of our bounds, but also the broader utility of this strategy beyond our theoretical assumptions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题