论文标题
异常值检测并不难:使用本地搜索技术,用于鲁棒聚类问题的近似算法
Outliers Detection Is Not So Hard: Approximation Algorithms for Robust Clustering Problems Using Local Search Techniques
论文作者
论文摘要
在本文中,我们考虑了$ k $ -Median/$ k $ -Means问题的两种类型的强大模型:异常值转换($ K $ -Medo/$ k $ -meao)和罚款转换($ K $ -MMEDP/$ K $ -MEAP),在这方面,我们可以将某些点标记为Outliers和Outliers和Outliers和Outliers和他们。在$ k $ -Medo/$ k $ -meao中,离群值的数量由给定的整数界定。在$ K $ -MEDP/$ K $ -MEAP中,我们不限制异常值的数量,但是每个异常值都会产生罚款费用。我们开发了一种新技术,通过引入一个适应性群集,可以在本地和全局最佳解决方案中捕获有关异常值的有用信息,从而分析这两个问题的本地搜索算法的近似值。对于$ k $ -meap,我们将基于本地搜索的最著名近似比率从$ 25+\ varepsilon $提高到$ 9+\ varepsilon $。对于$ K $ -MEDP,我们获得了最著名的近似值。对于$ k $ -Medo/$ k $ -meao,基于本地搜索只有两种双标准近似算法。一个违反了异常约束(对异常值的约束),而另一个违反了基数约束(群集数量的约束)。我们考虑以前的算法,并将其近似值从$ 17+\ varepsilon $提高到$ k $ -Medo的$ 3+\ varepsilon $,而从$ 274+\ varepsilon $到$ 9+\ varepsilon $ for $ k $ -meao。
In this paper, we consider two types of robust models of the $k$-median/$k$-means problems: the outlier-version ($k$-MedO/$k$-MeaO) and the penalty-version ($k$-MedP/$k$-MeaP), in which we can mark some points as outliers and discard them. In $k$-MedO/$k$-MeaO, the number of outliers is bounded by a given integer. In $k$-MedP/$k$-MeaP, we do not bound the number of outliers, but each outlier will incur a penalty cost. We develop a new technique to analyze the approximation ratio of local search algorithms for these two problems by introducing an adapted cluster that can capture useful information about outliers in the local and the global optimal solution. For $k$-MeaP, we improve the best known approximation ratio based on local search from $25+\varepsilon$ to $9+\varepsilon$. For $k$-MedP, we obtain the best known approximation ratio. For $k$-MedO/$k$-MeaO, there exists only two bi-criteria approximation algorithms based on local search. One violates the outlier constraint (the constraint on the number of outliers), while the other violates the cardinality constraint (the constraint on the number of clusters). We consider the former algorithm and improve its approximation ratios from $17+\varepsilon$ to $3+\varepsilon$ for $k$-MedO, and from $274+\varepsilon$ to $9+\varepsilon$ for $k$-MeaO.