论文标题
NETDP:一个工业规模的分布式网络表示框架,用于蚂蚁信用薪资中的默认预测
NetDP: An Industrial-Scale Distributed Network Representation Framework for Default Prediction in Ant Credit Pay
论文作者
论文摘要
Ant Credit Pay是Ant Financial Service Group的消费信贷服务。与信用卡类似,贷款违约是该信用产品的主要风险之一。因此,默认预测的有效算法是公司减少损失和利润增加的关键。但是,在我们的情况下面临的挑战与传统信用卡服务中的挑战不同。第一个是可伸缩性。大量的用户及其在ANT财务中的行为需要处理工业规模的数据并有效地进行模型培训的能力。第二个挑战是冷启动的问题。与传统银行中信用卡申请的手册审查不同,蚂蚁信用薪资的信用额度会根据从大数据中学到的知识自动提供给用户。但是,新用户的默认预测缺乏足够的信用行为。它要求该提案应利用其他新数据源来减轻寒冷的问题。考虑到上述挑战和ANT Financial的特殊情况,我们尝试将默认预测与网络信息结合起来,以减轻寒冷的启动问题。在本文中,我们提出了一个工业规模的分布式网络表示框架,称为NETDP,以违约预测ANT信用薪资。该提案探讨了用户之间各种交互产生的网络信息,并在统一的默认预测问题的统一框架中混合了无监督和监督的网络表示形式。此外,我们提出了一个基于参数服务器的分布式实施,以应对可伸缩性挑战。实验结果证明了我们的提案的有效性,尤其是在冷启动问题中,以及工业规模数据集的效率。
Ant Credit Pay is a consumer credit service in Ant Financial Service Group. Similar to credit card, loan default is one of the major risks of this credit product. Hence, effective algorithm for default prediction is the key to losses reduction and profits increment for the company. However, the challenges facing in our scenario are different from those in conventional credit card service. The first one is scalability. The huge volume of users and their behaviors in Ant Financial requires the ability to process industrial-scale data and perform model training efficiently. The second challenges is the cold-start problem. Different from the manual review for credit card application in conventional banks, the credit limit of Ant Credit Pay is automatically offered to users based on the knowledge learned from big data. However, default prediction for new users is suffered from lack of enough credit behaviors. It requires that the proposal should leverage other new data source to alleviate the cold-start problem. Considering the above challenges and the special scenario in Ant Financial, we try to incorporate default prediction with network information to alleviate the cold-start problem. In this paper, we propose an industrial-scale distributed network representation framework, termed NetDP, for default prediction in Ant Credit Pay. The proposal explores network information generated by various interaction between users, and blends unsupervised and supervised network representation in a unified framework for default prediction problem. Moreover, we present a parameter-server-based distributed implement of our proposal to handle the scalability challenge. Experimental results demonstrate the effectiveness of our proposal, especially in cold-start problem, as well as the efficiency for industrial-scale dataset.