量化梯度提升决策树的培训

论文标题

量化梯度提升决策树的培训

Quantized Training of Gradient Boosting Decision Trees

论文作者

Shi, Yu, Ke, Guolin, Chen, Zhuoming, Zheng, Shuxin, Liu, Tie-Yan

论文摘要

近年来，在广泛的机器学习应用程序中，在梯度增强决策树（GBDT）方面取得了显着成功。通常，关于GBDT训练算法的共识是梯度，统计数据是基于高精度浮点计算的。在本文中，我们调查了一个本质上重要的问题，该问题在很大程度上被以前的文献所忽略了：代表培训GBDT的梯度需要多少位？为了解决这个谜团，我们建议在GBDT的培训算法中以非常简单但有效的方式量化所有高精度梯度。令人惊讶的是，我们的理论分析和实证研究都表明，梯度的必要精度而不伤害任何性能可能很低，例如2或3位。对于低精度梯度，GBDT培训中的大多数算术操作都可以用8、16或32位的整数操作代替。有希望的是，这些发现可能为从几个方面对GBDT进行更有效培训的方式铺平了道路：（1）加快直方图中梯度统计的计算；（2）在分布式培训期间压缩高精度统计信息的通信成本；（3）使用和开发硬件体系结构的灵感，这些架构很好地支持了用于GBDT培训的低精确计算。与大量数据集中的SOTA GBDT系统相比，我们在CPU，GPU和分布式群集上进行了基准测试，最多可以观察到我们简单的量化策略的速度，这表明了GBDT的低精度培训的有效性和潜力。该代码将发布给LightGBM的官方存储库。

Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus about GBDT's training algorithms is gradients and statistics are computed based on high-precision floating points. In this paper, we investigate an essentially important question which has been largely ignored by the previous literature: how many bits are needed for representing gradients in training GBDT? To solve this mystery, we propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm. Surprisingly, both our theoretical analysis and empirical studies show that the necessary precisions of gradients without hurting any performance can be quite low, e.g., 2 or 3 bits. With low-precision gradients, most arithmetic operations in GBDT training can be replaced by integer operations of 8, 16, or 32 bits. Promisingly, these findings may pave the way for much more efficient training of GBDT from several aspects: (1) speeding up the computation of gradient statistics in histograms; (2) compressing the communication cost of high-precision statistical information during distributed training; (3) the inspiration of utilization and development of hardware architectures which well support low-precision computation for GBDT training. Benchmarked on CPUs, GPUs, and distributed clusters, we observe up to 2$\times$ speedup of our simple quantization strategy compared with SOTA GBDT systems on extensive datasets, demonstrating the effectiveness and potential of the low-precision training of GBDT. The code will be released to the official repository of LightGBM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题