论文标题

网格:快速有效的超参数搜索内核聚类

Off-the-grid: Fast and Effective Hyperparameter Search for Kernel Clustering

论文作者

Ordozgoiti, Bruno, Muñoz, Lluís A. Belanche

论文摘要

内核函数是通过内核技巧增强$ k $ - 均值聚类算法的强大工具。众所周知,所选内核函数的参数可能会对结果产生巨大影响。在有监督的设置中,可以通过交叉验证调整这些设置,但是对于聚类而言,这不是直接的,并且通常使用启发式方法。在本文中,我们研究了内核参数对内核$ k $ -Means的影响。特别是,我们得出了一个下限,紧密到恒定的因素,在此下方RBF内核的参数将使内核$ k $ -Means毫无意义。我们认为,在此上下文中,网格搜索对于超参数搜索可能无效,并为此目的提出了另一种算法。此外,我们提供了基于快速近似指示的有效实施,并提供可证明的质量保证。我们的实验结果表明,我们方法有效揭示了一组丰富而有用的超参数值的能力。

Kernel functions are a powerful tool to enhance the $k$-means clustering algorithm via the kernel trick. It is known that the parameters of the chosen kernel function can have a dramatic impact on the result. In supervised settings, these can be tuned via cross-validation, but for clustering this is not straightforward and heuristics are usually employed. In this paper we study the impact of kernel parameters on kernel $k$-means. In particular, we derive a lower bound, tight up to constant factors, below which the parameter of the RBF kernel will render kernel $k$-means meaningless. We argue that grid search can be ineffective for hyperparameter search in this context and propose an alternative algorithm for this purpose. In addition, we offer an efficient implementation based on fast approximate exponentiation with provable quality guarantees. Our experimental results demonstrate the ability of our method to efficiently reveal a rich and useful set of hyperparameter values.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源