论文标题
协方差矩阵的强大估计:对抗性污染及其他
Robust Estimation of Covariance Matrices: Adversarial Contamination and Beyond
论文作者
论文摘要
我们考虑了估算从示例$ y_1,\ ldots,y_n $的随机向量$ y \ y in \ mathbb r^d $的协方差结构的问题。与$ n $相比,当$ d $大的情况下,我们对这种情况感兴趣,但是协方差矩阵$σ$(恰好或大约)低级。我们假设给定的样本为(a)$ε$ - 对抗性损坏,这意味着观测值的$ε$分数可能已被任意向量替换,或者(b)样本为i.i.d。但是,基本分配是重尾的,这意味着$ y $的规范仅具有有限的第四刻。我们提出了一个适应协方差矩阵的潜在低级结构以及受污染数据的比例的估计器,尽管对基本分布的假设较弱,但仍承认紧张的偏差保证。最后,我们讨论允许以数值有效的方式近似提出的估计量的算法。
We consider the problem of estimating the covariance structure of a random vector $Y\in \mathbb R^d$ from a sample $Y_1,\ldots,Y_n$. We are interested in the situation when $d$ is large compared to $n$ but the covariance matrix $Σ$ of interest has (exactly or approximately) low rank. We assume that the given sample is (a) $ε$-adversarially corrupted, meaning that $ε$ fraction of the observations could have been replaced by arbitrary vectors, or that (b) the sample is i.i.d. but the underlying distribution is heavy-tailed, meaning that the norm of $Y$ possesses only finite fourth moments. We propose an estimator that is adaptive to the potential low-rank structure of the covariance matrix as well as to the proportion of contaminated data, and admits tight deviation guarantees despite rather weak assumptions on the underlying distribution. Finally, we discuss the algorithms that allow to approximate the proposed estimator in a numerically efficient way.