论文标题
当遗忘异常值不知所措时一致的回归
Consistent regression when oblivious outliers overwhelm
论文作者
论文摘要
我们考虑了一个可靠的线性回归模型$ y =xβ^* +η$,在\ mathbb {r}^{r}^{n \ times d} $ in \ mathbb {r}^{n \ times d} $中,对手可能会选择$η$以损坏所有观测$ y $ y $ y $ y $ y $ y $ y $ y $ y $ y $ y $的$ h $。在我们的工作之前,即使对于高斯$ x $,除了二次样本尺寸$ n \ gtrsim(d/α)^2 $外,$β^*$的估计器在此模型中是一致的。我们表明,几乎线性样本量和逆多项式嵌入分数是可能的一致估计。具体而言,我们证明了Huber损失估计器对于每个样本量$ n =ω(d/α^2)$都一致,并且达到了$ O(d/α^2n)^{1/2} $的错误率。这两个界限都是最佳的(最多达到恒定因素)。我们的结果扩展到远远超出高斯外壳的设计,仅需要$ x $的列跨度即可不包含大约稀疏的向量)。 (类似于对压缩传感的内核空间通常做出的假设)。我们提供两个技术上相似的证据。一个证据是根据强凸性的措辞,扩展了[Tsakonas等人14]的工作,特别是短暂的。其他证明突出了Huber损失估计器与高维中位数计算之间的联系。在高斯设计的特殊情况下,这种连接使我们基于计算坐标的中位数,达到了一种惊人的简单算法,该算法在几乎线上获得了最佳保证,并且可以利用$β^*$的稀疏性。这里研究的模型还捕获了可能没有第一刻的重尾噪声分布。
We consider a robust linear regression model $y=Xβ^* + η$, where an adversary oblivious to the design $X\in \mathbb{R}^{n\times d}$ may choose $η$ to corrupt all but an $α$ fraction of the observations $y$ in an arbitrary way. Prior to our work, even for Gaussian $X$, no estimator for $β^*$ was known to be consistent in this model except for quadratic sample size $n \gtrsim (d/α)^2$ or for logarithmic inlier fraction $α\ge 1/\log n$. We show that consistent estimation is possible with nearly linear sample size and inverse-polynomial inlier fraction. Concretely, we show that the Huber loss estimator is consistent for every sample size $n= ω(d/α^2)$ and achieves an error rate of $O(d/α^2n)^{1/2}$. Both bounds are optimal (up to constant factors). Our results extend to designs far beyond the Gaussian case and only require the column span of $X$ to not contain approximately sparse vectors). (similar to the kind of assumption commonly made about the kernel space for compressed sensing). We provide two technically similar proofs. One proof is phrased in terms of strong convexity, extending work of [Tsakonas et al.'14], and particularly short. The other proof highlights a connection between the Huber loss estimator and high-dimensional median computations. In the special case of Gaussian designs, this connection leads us to a strikingly simple algorithm based on computing coordinate-wise medians that achieves optimal guarantees in nearly-linear time, and that can exploit sparsity of $β^*$. The model studied here also captures heavy-tailed noise distributions that may not even have a first moment.