论文标题
约翰逊·林登斯特劳斯(Johnson-Lindenstrauss
The Effectiveness of Johnson-Lindenstrauss Transform for High Dimensional Optimization With Adversarial Outliers, and the Recovery
论文作者
论文摘要
在本文中,我们考虑在高维度中的强大优化问题。由于现实世界中的数据集可能包含大量噪声,甚至可能包含一些攻击者制作的样本,因此我们对任意(和潜在的对抗性)异常值的优化问题特别感兴趣。我们专注于两个基本优化问题:{\ em svm带有异常值}和{\ em $ $ $ k $ - center clustering at utliers}。实际上,它们是极具挑战性的组合优化问题,因为我们不能对对抗异常值施加任何限制。因此,它们的计算复杂性很高,尤其是当我们考虑高维空间中的实例时。 {\ em johnson-lindenstrauss(JL)变换}是降低维度最流行的方法之一。尽管在过去的几十年中,JL转换已被广泛研究,但以前从未研究过与对抗异常值打交道的有效性(据我们所知)。根据几何形状的一些新见解,我们证明,通过JL变换可以显着降低这两个问题的复杂性。此外,我们证明,在原始$ \ mathbb {r}^d $仍然保留质量时,可以在降低维度降低空间中的解决方案有效地恢复。在实验中,我们将JL变换与其他几种众所周知的降低方法进行了比较,并研究了它们在合成和真实数据集上的性能。
In this paper, we consider robust optimization problems in high dimensions. Because a real-world dataset may contain significant noise or even specially crafted samples from some attacker, we are particularly interested in the optimization problems with arbitrary (and potentially adversarial) outliers. We focus on two fundamental optimization problems: {\em SVM with outliers} and {\em $k$-center clustering with outliers}. They are in fact extremely challenging combinatorial optimization problems, since we cannot impose any restriction on the adversarial outliers. Therefore, their computational complexities are quite high especially when we consider the instances in high dimensional spaces. The {\em Johnson-Lindenstrauss (JL) Transform} is one of the most popular methods for dimension reduction. Though the JL transform has been widely studied in the past decades, its effectiveness for dealing with adversarial outliers has never been investigated before (to the best of our knowledge). Based on some novel insights from the geometry, we prove that the complexities of these two problems can be significantly reduced through the JL transform. Moreover, we prove that the solution in the dimensionality-reduced space can be efficiently recovered in the original $\mathbb{R}^d$ while the quality is still preserved. In the experiments, we compare JL transform with several other well known dimension reduction methods, and study their performances on synthetic and real datasets.