功能箱：大规模广告系统的GPU上的功能工程

论文标题

功能箱：大规模广告系统的GPU上的功能工程

FeatureBox: Feature Engineering on GPUs for Massive-Scale Ads Systems

论文作者

Zhao, Weijie, Jiao, Xuewu, Luo, Xinsheng, Li, Jingxue, Karimi, Belhal, Li, Ping

论文摘要

深度学习已被广泛部署到在线广告系统中，以预测点击率（CTR）。机器学习研究人员和从业人员经常重新培训CTR模型来测试其新提取的功能。但是，CTR模型培训通常依赖大量的原始输入数据日志。因此，该功能提取可以为工业级别的CTR模型所需的训练时间大部分。在本文中，我们提出了一个新型的端到端训练框架，该框架可以管道提取功能提取和GPU服务器上的培训，以节省特征提取的中间I/O。我们将计算密集型功能提取操作员作为GPU操作员重写，并将内存密集型操作员放在CPU上。我们介绍了层面操作员调度算法，以安排这些异质运算符。我们提出了一种轻巧的GPU内存管理算法，该算法支持动态的GPU内存分配，并以最小的开销。我们通过实验评估功能框，并将其与两个现实广告应用程序上的以前的内部生产特征提取框架进行比较。结果证实了我们提出的方法的有效性。

Deep learning has been widely deployed for online ads systems to predict Click-Through Rate (CTR). Machine learning researchers and practitioners frequently retrain CTR models to test their new extracted features. However, the CTR model training often relies on a large number of raw input data logs. Hence, the feature extraction can take a significant proportion of the training time for an industrial-level CTR model. In this paper, we propose FeatureBox, a novel end-to-end training framework that pipelines the feature extraction and the training on GPU servers to save the intermediate I/O of the feature extraction. We rewrite computation-intensive feature extraction operators as GPU operators and leave the memory-intensive operator on CPUs. We introduce a layer-wise operator scheduling algorithm to schedule these heterogeneous operators. We present a light-weight GPU memory management algorithm that supports dynamic GPU memory allocation with minimal overhead. We experimentally evaluate FeatureBox and compare it with the previous in-production feature extraction framework on two real-world ads applications. The results confirm the effectiveness of our proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题