论文标题

基于比较的分层群集的收入功能

A Revenue Function for Comparison-Based Hierarchical Clustering

论文作者

Mandal, Aishik, Perrot, Michaël, Ghoshdastidar, Debarghya

论文摘要

基于比较的学习解决了学习的问题,而不是明确的功能或成对的相似性,只能访问该形式的比较:\ emph {object $ a $比$ b $更相似于$ b $。}最近,它已经表明,在层次结构,单一和完整的链接中,只能使用此类链接来实现几个Algorthms,而在层次结构中直接实现了几个Algorithms,则可以直接实现。因此,仅使用比较找到层次结构(或树状图)是一个充分了解的问题。但是,当没有基本真相或明确的相似性时,评估其意义仍然是一个悬而未决的问题。 在本文中,我们通过提出一种新的收入功能来弥合这一差距,该功能允许仅使用比较来测量树状图的优点。我们表明,此功能与Dasgupta使用成对相似性的分层聚类的成本密切相关。在理论方面,我们使用拟议的收入函数来解决一个开放的问题,即是否可以使用少数三重态比较近似恢复潜在的层次结构。从实际方面来说,我们提出了基于收入的最大化的基于比较的层次聚类的原则性算法,我们从经验上将其与现有方法进行了比较。

Comparison-based learning addresses the problem of learning when, instead of explicit features or pairwise similarities, one only has access to comparisons of the form: \emph{Object $A$ is more similar to $B$ than to $C$.} Recently, it has been shown that, in Hierarchical Clustering, single and complete linkage can be directly implemented using only such comparisons while several algorithms have been proposed to emulate the behaviour of average linkage. Hence, finding hierarchies (or dendrograms) using only comparisons is a well understood problem. However, evaluating their meaningfulness when no ground-truth nor explicit similarities are available remains an open question. In this paper, we bridge this gap by proposing a new revenue function that allows one to measure the goodness of dendrograms using only comparisons. We show that this function is closely related to Dasgupta's cost for hierarchical clustering that uses pairwise similarities. On the theoretical side, we use the proposed revenue function to resolve the open problem of whether one can approximately recover a latent hierarchy using few triplet comparisons. On the practical side, we present principled algorithms for comparison-based hierarchical clustering based on the maximisation of the revenue and we empirically compare them with existing methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源