论文标题

提议的信用风险记分卡中人口稳定性测试的拟议模拟技术

A proposed simulation technique for population stability testing in credit risk scorecards

论文作者

Pisanie, J. du, Allison, J. S., Visagie, I. J. H.

论文摘要

信用风险记分卡是逻辑回归模型,该模型适用于金融行业的大型和复杂的数据集,以模拟潜在客户违约的可能性。为了确保记分卡仍然是人口的代表性模型,一个人测试了人口稳定的假设;指定客户属性的分布会随着时间的流逝而保持不变。为此目的模拟现实的数据集是不平凡的,因为这些数据集是多元的,并且包含复杂的依赖关系。这些数据集的仿真对从业者和研究人员都具有实际利益。从业者可能希望考虑到数据属性对记分卡的特定变化及其有用性,而研究人员可能希望测试新开发的信用评分技术。 我们提出了一种基于不良比例规范的仿真技术,如下所述。通常不期望从业者为记分卡提供现实的参数值。这些模型简直太复杂了,并且包含太多参数,无法使这种规范可行。但是,从业人员通常可以确信与两个不同级别的特定属性相关的不良比例。也就是说,从业者通常可以舒适地做出陈述,例如“平均而言,新客户的可能性是具有类似属性的现有客户的1.5倍”。我们提出了一种可用于根据指定不良比例获得记分卡的参数值的方法。使用现实的示例证明了所提出的技术,我们表明模拟数据集与指定的不良比例紧密相连。本文提供了一个链接到GITHUB项目的链接,其中使用R代码来生成显示的结果。

Credit risk scorecards are logistic regression models, fitted to large and complex data sets, employed by the financial industry to model the probability of default of a potential customer. In order to ensure that a scorecard remains a representative model of the population one tests the hypothesis of population stability; specifying that the distribution of clients' attributes remains constant over time. Simulating realistic data sets for this purpose is nontrivial as these data sets are multivariate and contain intricate dependencies. The simulation of these data sets are of practical interest for both practitioners and for researchers; practitioners may wish to consider the effect that a specified change in the properties of the data has on the scorecard and its usefulness from a business perspective, while researchers may wish to test a newly developed technique in credit scoring. We propose a simulation technique based on the specification of bad ratios, this is explained below. Practitioners can generally not be expected to provide realistic parameter values for a scorecard; these models are simply too complex and contain too many parameters to make such a specification viable. However, practitioners can often confidently specify the bad ratio associated with two different levels of a specific attribute. That is, practitioners are often comfortable with making statements such as "on average a new customer is 1.5 times as likely to default as an existing customer with similar attributes". We propose a method which can be used to obtain parameter values for a scorecard based on specified bad ratios. The proposed technique is demonstrated using a realistic example and we show that the simulated data sets adhere closely to the specified bad ratios. The paper provides a link to a github project in which the R code used in order to generate the results shown can be found.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源