论文标题

改进了欧几里得$ k $ -Means的核心

Improved Coresets for Euclidean $k$-Means

论文作者

Cohen-Addad, Vincent, Larsen, Kasper Green, Saulpic, David, Schwiegelshohn, Chris, Sheikh-Omar, Omar Ali

论文摘要

考虑到$ d $尺寸中的一组$ n $点,欧几里得$ k $ -means问题(分别是欧几里得$ k $ -Median问题)包括查找$ k $中心,使得从每个点到其最接近的中心的平方距离之和距离的总和(分别距离之和)。可以说,在大数据设置中处理此问题的最流行方式是首先通过计算称为核心的加权子集来压缩数据,然后在此子集上运行任何算法。核心的保证是,对于任何候选解决方案,核心成本与原始实例的成本之间的比率小于$(1 \ pm \ varepsilon)$。艺术核心大小的当前状态为$ \ tilde o(\ min(k^{2} \ cdot \ cdot \ varepsilon^{ - 2},k \ cdot \ cdot \ cdot \ varepsilon^{ - 4})$ for Euclidean $ k $ k $ -means和$ k $ -means and $ \ tilde o( \ varepsilon^{ - 2},k \ cdot \ varepsilon^{ - 3}))$ for Euclidean $ k $ -Median。这两个问题最著名的下限是$ω(k \ varepsilon^{ - 2})$。在本文中,我们改善了上限$ \ tilde o(\ min(k^{3/2} \ cdot \ cdot \ varepsilon^{ - 2},k \ cdot \ cdot \ cdot \ varepsilon^{ - 4})$ for $ k $ -means和$ k $ -means and $ \ tilde o($ \ \ tilde o)( \ varepsilon^{ - 2},k \ cdot \ varepsilon^{ - 3}))$ for $ k $ -Median。特别是,我们的第一个可证明的界限突破了$ k^2 $障碍,同时保留了对$ \ varepsilon $的最佳依赖性。

Given a set of $n$ points in $d$ dimensions, the Euclidean $k$-means problem (resp. the Euclidean $k$-median problem) consists of finding $k$ centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a $(1\pm \varepsilon)$ factor. The current state of the art coreset size is $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for Euclidean $k$-means and $\tilde O(\min(k^{2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for Euclidean $k$-median. The best known lower bound for both problems is $Ω(k \varepsilon^{-2})$. In this paper, we improve the upper bounds $\tilde O(\min(k^{3/2} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-4}))$ for $k$-means and $\tilde O(\min(k^{4/3} \cdot \varepsilon^{-2},k\cdot \varepsilon^{-3}))$ for $k$-median. In particular, ours is the first provable bound that breaks through the $k^2$ barrier while retaining an optimal dependency on $\varepsilon$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源