论文标题
图形的局部固有维度度量,并应用于图形嵌入
Local Intrinsic Dimensionality Measures for Graphs, with Applications to Graph Embeddings
论文作者
论文摘要
局部内在维度(LID)的概念是数据维度分析的重要进步,并在数据挖掘,机器学习和相似性搜索问题中应用了。现有的基于距离的盖估计器设计用于包含欧几里得空间中向量的数据点的表格数据集。在讨论了考虑图嵌入和图形距离的图形结构数据的局限性之后,我们提出了NC-LID,这是一种与盖子相关的新型措施,用于量化最短路径距离相对于节点的自然社区的歧视能力作为其本质所在地。它显示了如何使用该度量来设计嵌入算法的图形图,并通过根据NC-LID值调整了Node2VEC的两个盖子弹性变体。我们对大量实际图表NC-LID的经验分析表明,该措施能够指向Node2VEC嵌入中较高链路重建误差的节点,而不是节点中心性指标。实验评估还表明,通过在生成的嵌入中更好地保留图形结构,提出的盖 - 弹性节点2VEC扩展可以改善节点2VEC。
The notion of local intrinsic dimensionality (LID) is an important advancement in data dimensionality analysis, with applications in data mining, machine learning and similarity search problems. Existing distance-based LID estimators were designed for tabular datasets encompassing data points represented as vectors in a Euclidean space. After discussing their limitations for graph-structured data considering graph embeddings and graph distances, we propose NC-LID, a novel LID-related measure for quantifying the discriminatory power of the shortest-path distance with respect to natural communities of nodes as their intrinsic localities. It is shown how this measure can be used to design LID-aware graph embedding algorithms by formulating two LID-elastic variants of node2vec with personalized hyperparameters that are adjusted according to NC-LID values. Our empirical analysis of NC-LID on a large number of real-world graphs shows that this measure is able to point to nodes with high link reconstruction errors in node2vec embeddings better than node centrality metrics. The experimental evaluation also shows that the proposed LID-elastic node2vec extensions improve node2vec by better preserving graph structure in generated embeddings.