论文标题

分类树不平衡和稀疏数据:地表到体积正则化

Classification Trees for Imbalanced and Sparse Data: Surface-to-Volume Regularization

论文作者

Zhu, Yichen, Li, Cheng, Dunson, David B.

论文摘要

当一个或多个课程的培训数据有限时,分类算法面临困难。由于它们的解释性和灵活性,我们对分类树特别感兴趣。当一个或多个类中的数据受到限制时,由于样本量有限,估计的决策边界通常会不规则形状,导致概括误差差。我们提出了一种新的方法,该方法惩罚了决策集的表面与体积比率(SVR),从而获得了新的SVR-Tree算法。我们开发了一个简单且计算上有效的实现,同时证明了SVR-Tree的估计一致性以及SVR-Tree理想化的经验风险最小化器的收敛速率。将SVR-TREE与多种算法进行比较,这些算法旨在通过实际数据应用来处理不平衡。

Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源