LFR参数空间中节点聚类的剖析图测量性能

论文标题

LFR参数空间中节点聚类的剖析图测量性能

Dissecting graph measure performance for node clustering in LFR parameter space

论文作者

Ivashkin, Vladimir, Chebotarev, Pavel

论文摘要

可以使用公制聚类算法来用于图形节点聚类的图表量度。有许多适用于此任务的措施，哪一个措施更好，这是一个悬而未决的问题。我们研究具有不同参数的生成图的25种图测量的性能。尽管通常的测量比较仅限于特定数据集中的一般度量排名，但我们旨在根据图形特征探索各种措施的性能。使用LFR Graph Generator，我们创建一个涵盖整个LFR参数空间的11780图的数据集。对于每个图，我们评估使用K均值算法的聚类的质量，每种考虑的度量。基于此，我们确定参数空间每个区域的最佳度量。我们发现参数空间由不同的区域组成，其中一种特定的度量是最好的。我们分析了所得区域的几何形状，并用简单的标准描述了它。给定特定的图形参数，这使我们可以推荐一种用于聚类的特定度量。

Graph measures that express closeness or distance between nodes can be employed for graph nodes clustering using metric clustering algorithms. There are numerous measures applicable to this task, and which one performs better is an open question. We study the performance of 25 graph measures on generated graphs with different parameters. While usually measure comparisons are limited to general measure ranking on a particular dataset, we aim to explore the performance of various measures depending on graph features. Using an LFR graph generator, we create a dataset of 11780 graphs covering the whole LFR parameter space. For each graph, we assess the quality of clustering with k-means algorithm for each considered measure. Based on this, we determine the best measure for each area of the parameter space. We find that the parameter space consists of distinct zones where one particular measure is the best. We analyze the geometry of the resulting zones and describe it with simple criteria. Given particular graph parameters, this allows us to recommend a particular measure to use for clustering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题