论文标题
使用基于机器学习的方法在复杂网络中的重要节点识别
Vital Node Identification in Complex Networks Using a Machine Learning-Based Approach
论文作者
论文摘要
重要节点识别是在复杂网络中找到最重要性的节点的问题。这个问题在各种情况下具有至关重要的应用,例如病毒营销或控制现实世界网络中病毒或谣言的传播。现有的重要节点识别方法主要集中于通过数学表达式捕获节点的重要性,该数学表达式将节点的结构特性与其活力直接相关。尽管这些启发式方法在实践中取得了良好的表现,但它们的适应性较弱,并且其性能仅限于特定的环境和某些动态。受到机器学习模型有效捕获不同类型的模式和关系的力量的启发,我们提出了一种基于机器学习的,数据驱动的重要节点识别方法。主要思想是用图表的一小部分(例如0.5%的节点)训练模型,并在其余节点上进行预测。火车数据的地面真实性是通过模拟从火车节点开始的SIR扩散方法来计算的。我们使用集体功能工程,其中通过结合其连接性,程度和扩展的疗效来表示网络中的每个节点。几种机器学习模型经过节点表示的培训,但是最好的结果是通过带有RBF内核的支持向量回归机来实现的。经验结果证实,所提出的模型在选择数据集的选择上优于最先进的模型,而它也显示出更大的适应性对动态参数变化的适应性。
Vital node identification is the problem of finding nodes of highest importance in complex networks. This problem has crucial applications in various contexts such as viral marketing or controlling the propagation of virus or rumours in real-world networks. Existing approaches for vital node identification mainly focus on capturing the importance of a node through a mathematical expression which directly relates structural properties of the node to its vitality. Although these heuristic approaches have achieved good performance in practice, they have weak adaptability, and their performance is limited to specific settings and certain dynamics. Inspired by the power of machine learning models for efficiently capturing different types of patterns and relations, we propose a machine learning-based, data driven approach for vital node identification. The main idea is to train the model with a small portion of the graph, say 0.5% of the nodes, and do the prediction on the rest of the nodes. The ground-truth vitality for the train data is computed by simulating the SIR diffusion method starting from the train nodes. We use collective feature engineering where each node in the network is represented by incorporating elements of its connectivity, degree and extended coreness. Several machine learning models are trained on the node representations, but the best results are achieved by a Support Vector Regression machine with RBF kernel. The empirical results confirms that the proposed model outperforms state-of-the-art models on a selection of datasets, while it also shows more adaptability to changes in the dynamics parameters.