论文标题
在分辨率核心上用于约束聚类
On resolution coresets for constrained clustering
论文作者
论文摘要
特定的数据压缩技术是由核心集的概念正式形式化的,事实证明,对于许多优化问题而言,具有强大的功能。实际上,尽管严格控制近似误差,但核心可能会导致计算的速度显着速度,因此可以将算法扩展到更大的问题大小。本文涉及材料科学成像的重量平衡聚类问题。在这里,所需的核类类别自然局限于可以看作降低数据分辨率的类别。因此,人们期望这种分辨率核心不如无限制的核心。但是,我们表明,通过利用数据的基础结构,限制远远超过了限制。特别是,我们证明了分辨率核心的界限,这些分辨率核心在相关维度中改善了已知界限,并导致算法实践明显更快。
Specific data compression techniques, formalized by the concept of coresets, proved to be powerful for many optimization problems. In fact, while tightly controlling the approximation error, coresets may lead to significant speed up of the computations and hence allow to extend algorithms to much larger problem sizes. The present paper deals with a weight-balanced clustering problem from imaging in materials science. Here, the class of desired coresets is naturally confined to those which can be viewed as lowering the resolution of the data. Hence one would expect that such resolution coresets are inferior to unrestricted coreset. We show, however, that the restrictions are more than compensated by utilizing the underlying structure of the data. In particular, we prove bounds for resolution coresets which improve known bounds in the relevant dimensions and also lead to significantly faster algorithms practice.