论文标题
电子商务中的行为图欺诈检测
Behavioral graph fraud detection in E-commerce
论文作者
论文摘要
在电子商务行业中,图形神经网络方法是交易风险建模的新趋势。图形算法的功能在于能够捕获链接网络信息的交易的能力,这些算法很难捕获其他算法。在具有强大识别特征(即设备)和可以广泛共享的实体的实体(即设备)的稀疏联系中(即IP地址),从图形中学习有用的信息变得更加困难。为了解决上述问题,我们提出了一种基于行为生物识别的新型方法,以基于用户行为相似性建立交易联系,然后训练无监督的GNN,以提取下游欺诈预测任务的嵌入功能。据我们所知,这是第一次基于相似性的软链接用于图形嵌入应用程序。为了加快相似性计算,我们采用内部GPU的HDBSCAN聚类方法来删除高度浓缩和孤立的节点,然后再构造。我们的实验表明,从基于相似性的行为图中学到的嵌入功能已在各种业务场景中实现了基准欺诈检测模型的显着性能提高。在新的客座买家交易方案中,这一细分市场是传统方法的挑战,我们可以在0.27召回时从0.82增加到0.82,这意味着我们可以使用此方法降低误报率。
In e-commerce industry, graph neural network methods are the new trends for transaction risk modeling.The power of graph algorithms lie in the capability to catch transaction linking network information, which is very hard to be captured by other algorithms.However, in most existing approaches, transaction or user connections are defined by hard link strategies on shared properties, such as same credit card, same device, same ip address, same shipping address, etc. Those types of strategies will result in sparse linkages by entities with strong identification characteristics (ie. device) and over-linkages by entities that could be widely shared (ie. ip address), making it more difficult to learn useful information from graph. To address aforementioned problems, we present a novel behavioral biometric based method to establish transaction linkings based on user behavioral similarities, then train an unsupervised GNN to extract embedding features for downstream fraud prediction tasks. To our knowledge, this is the first time similarity based soft link has been used in graph embedding applications. To speed up similarity calculation, we apply an in-house GPU based HDBSCAN clustering method to remove highly concentrated and isolated nodes before graph construction. Our experiments show that embedding features learned from similarity based behavioral graph have achieved significant performance increase to the baseline fraud detection model in various business scenarios. In new guest buyer transaction scenario, this segment is a challenge for traditional method, we can make precision increase from 0.82 to 0.86 at the same recall of 0.27, which means we can decrease false positive rate using this method.