论文标题
高维线性学习的强大方法
Robust Methods for High-Dimensional Linear Learning
论文作者
论文摘要
我们在高维批处理设置中提出了统计上强大的和计算高效的线性学习方法,其中功能的数量$ d $可能超过样本量$ n $。在通用学习环境中,我们采用了两种算法,具体取决于所考虑的损失函数是否为梯度lipschitz。然后,我们将我们的框架实例化,包括几种应用程序,包括香草稀疏,群 - 帕克斯和低级别矩阵恢复。对于每种应用,这导致了有效且稳健的学习算法,这些算法达到了重型分布和异常值的存在下的近乎最佳估计率。对于香草$ s $ -Sparsity,我们能够以与非持bust ubust类似物相比的计算成本,在重尾和$η$ - 腐败下达到$ s \ log(d)/n $速率。我们通过开源$ \ mathtt {python} $库提供了有效的算法实现,称为$ \ mathtt {linlealen} $,通过该库,我们通过该库来确认我们的理论发现以及与文献中其他最新方法的比较。
We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting, where the number of features $d$ may exceed the sample size $n$. We employ, in a generic learning setting, two algorithms depending on whether the considered loss function is gradient-Lipschitz or not. Then, we instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery. This leads, for each application, to efficient and robust learning algorithms, that reach near-optimal estimation rates under heavy-tailed distributions and the presence of outliers. For vanilla $s$-sparsity, we are able to reach the $s\log (d)/n$ rate under heavy-tails and $η$-corruption, at a computational cost comparable to that of non-robust analogs. We provide an efficient implementation of our algorithms in an open-source $\mathtt{Python}$ library called $\mathtt{linlearn}$, by means of which we carry out numerical experiments which confirm our theoretical findings together with a comparison to other recent approaches proposed in the literature.