论文标题

将欧几里得空间置于点云中所需的距离关系

Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds

论文作者

Rass, Stefan, König, Sandra, Ahmad, Shahzad, Goman, Maksim

论文摘要

给定欧几里得空间中的一组点$ \ mathbb {r}^\ ell $带有$ \ ell> 1 $,点之间的成对距离由它们的空间位置和我们endow $ \ mathbb {r}^\ ell $的公制$ d $确定。因此,在两个点之间的距离$ d(\ mathbf x,\ mathbf y)=δ$通过$ \ mathbf x $和$ \ mathbf y $和$ d $的选择来固定。我们研究了修复$δ$的相关问题,以及点$ \ mathbf x,\ mathbf y $,并询问是否有一个拓扑度量$ d $来计算所需距离$δ$。我们通过构造一个度量来同时提供最多$ o(\ sqrt \ ell)$多点之间的所需的成对距离来证明这个问题,可以解决此问题。然后,我们介绍了$ \ varepsilon $ -semimetric $ \ tilde {d} $的概念,以提出我们的主要结果:对于所有$ \ varepsilon> 0 $,对于所有$ m \ geq 1 $,对于$ m $ m $ $ $ $ $ $ \ nathbf y__1,\ ldots,\ ldots,\ ldots,\ ldots,\ ldots,\ mathbfffff。 y_m \ in \ Mathbb {r}^\ ell $,以及所有选择的值$ \ {δ_{δ_{ij} \ geq 0:1 \ leq i <j \ j \ leq m \ \} $ \ Mathbb {r}^\ ell \ to \ mathbb {r} $,以至于$ \ tilde {d}(\ mathbf y_i,\ mathbf y_j)=Δ_{ij {ij} $,即,所需的距离是实现的。我们通过使用它们来攻击无监督的学习算法,尤其是$ k $ -MEANS和基于密度的(DBSCAN)聚类算法来展示我们的结果。这些在人工智能中具有多种应用,并让它们以本文所示的方式构建的外部提供的距离度量,可以使聚类算法产生预先确定并因此可延展的结果。这表明,除非有标准化和固定的处方来使用特定的距离函数,否则聚类算法的结果通常不值得信赖。

Given a set of points in the Euclidean space $\mathbb{R}^\ell$ with $\ell>1$, the pairwise distances between the points are determined by their spatial location and the metric $d$ that we endow $\mathbb{R}^\ell$ with. Hence, the distance $d(\mathbf x,\mathbf y)=δ$ between two points is fixed by the choice of $\mathbf x$ and $\mathbf y$ and $d$. We study the related problem of fixing the value $δ$, and the points $\mathbf x,\mathbf y$, and ask if there is a topological metric $d$ that computes the desired distance $δ$. We demonstrate this problem to be solvable by constructing a metric to simultaneously give desired pairwise distances between up to $O(\sqrt\ell)$ many points in $\mathbb{R}^\ell$. We then introduce the notion of an $\varepsilon$-semimetric $\tilde{d}$ to formulate our main result: for all $\varepsilon>0$, for all $m\geq 1$, for any choice of $m$ points $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$, and all chosen sets of values $\{δ_{ij}\geq 0: 1\leq i<j\leq m\}$, there exists an $\varepsilon$-semimetric $\tildeδ:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$ such that $\tilde{d}(\mathbf y_i,\mathbf y_j)=δ_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. We showcase our results by using them to attack unsupervised learning algorithms, specifically $k$-Means and density-based (DBSCAN) clustering algorithms. These have manifold applications in artificial intelligence, and letting them run with externally provided distance measures constructed in the way as shown here, can make clustering algorithms produce results that are pre-determined and hence malleable. This demonstrates that the results of clustering algorithms may not generally be trustworthy, unless there is a standardized and fixed prescription to use a specific distance function.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源