论文标题
快速ABC助推器的包装
Package for Fast ABC-Boost
论文作者
论文摘要
该报告介绍了开源软件包,该包装在过去几年中实现了我们的一系列增强作品。特别是,该软件包主要包括三条技术,其中以下两个已经是流行的增强树平台中的标准实现: (i)基于直方图的(特征绑定)方法使树实现方便有效。在Li等人(2007年)中,开发了一种简单的固定长度自适应binning算法。在本报告中,我们证明,与流行树平台中更复杂的变体相比,如此简单的算法仍然非常有效。 (ii)基于损失函数的二阶导数,在Li(20010)中的显式增益公式通常在一阶方法上通常会大大改善。尽管LI(2010)中的增益公式是用于逻辑回归损失的,但它是具有第二个衍生物的损失函数的通用公式。例如,开源软件包还包括$ p \ geq 1 $的$ L_P $回归。 该软件包的主要贡献是用于多类分类的ABC-Boost(自适应基类提升)。 Li(2008)中的最初作品通过指定“基类”来得出了经典多级逻辑回归的一组新衍生物。如果正确选择基类,则可以显着提高准确性。主要的技术挑战是设计搜索策略以选择基类。先前发布的作品实施了一个详尽的搜索程序,以找到计算上太昂贵的基类。最近,一份新报告(Li and Zhao,20022)提出了“快速ABC-Boost”的统一框架,该框架允许用户有效地为基类选择适当的搜索空间。 该软件包提供了Linux,Windows,Mac,Matlab,R,Python的接口。
This report presents the open-source package which implements the series of our boosting works in the past years. In particular, the package includes mainly three lines of techniques, among which the following two are already the standard implementations in popular boosted tree platforms: (i) The histogram-based (feature-binning) approach makes the tree implementation convenient and efficient. In Li et al (2007), a simple fixed-length adaptive binning algorithm was developed. In this report, we demonstrate that such a simple algorithm is still surprisingly effective compared to more sophisticated variants in popular tree platforms. (ii) The explicit gain formula in Li (20010) for tree splitting based on second-order derivatives of the loss function typically improves, often considerably, over the first-order methods. Although the gain formula in Li (2010) was derived for logistic regression loss, it is a generic formula for loss functions with second-derivatives. For example, the open-source package also includes $L_p$ regression for $p\geq 1$. The main contribution of this package is the ABC-Boost (adaptive base class boosting) for multi-class classification. The initial work in Li (2008) derived a new set of derivatives of the classical multi-class logistic regression by specifying a "base class". The accuracy can be substantially improved if the base class is chosen properly. The major technical challenge is to design a search strategy to select the base class. The prior published works implemented an exhaustive search procedure to find the base class which is computationally too expensive. Recently, a new report (Li and Zhao, 20022) presents a unified framework of "Fast ABC-Boost" which allows users to efficiently choose the proper search space for the base class. The package provides interfaces for linux, windows, mac, matlab, R, python.