bars-ctr：打开单击率预测的基准测试

论文标题

bars-ctr：打开单击率预测的基准测试

BARS-CTR: Open Benchmarking for Click-Through Rate Prediction

论文作者

Zhu, Jieming, Liu, Jinyang, Yang, Shuai, Zhang, Qi, He, Xiuqiang

论文摘要

点击率（CTR）预测是许多应用程序的关键任务，因为其准确性会直接影响用户体验和平台收入。近年来，CTR预测在学术界和行业中都得到了广泛的研究，从而导致了各种各样的CTR预测模型。不幸的是，对于CTR预测研究，仍然缺乏标准化的基准和统一的评估协议。这导致现有研究中的实验结果不可复制甚至不一致，这在很大程度上限制了其研究的实际价值和潜在影响。在这项工作中，我们旨在对CTR预测进行开放的基准测试，并以可重复的方式对不同模型进行严格的比较。为此，我们总共进行了7,000多个实验，总共超过12,000个GPU小时，重新评估了多个数据集和设置上的24个现有型号。令人惊讶的是，我们的实验表明，通过足够的超参数搜索和模型调整，许多深层模型的差异较小。结果还表明，在CTR预测的建模上取得真正的进步确实是一项非常具有挑战性的研究任务。我们认为，我们的基准测试工作不仅可以使研究人员便于方便地评估新模型的有效性，而且还可以使它们与艺术的状态相提并论。我们已公开发布了我们工作的基准规范，评估协议和超参数设置，以促进该领域的可重复研究。

Click-through rate (CTR) prediction is a critical task for many applications, as its accuracy has a direct impact on user experience and platform revenue. In recent years, CTR prediction has been widely studied in both academia and industry, resulting in a wide variety of CTR prediction models. Unfortunately, there is still a lack of standardized benchmarks and uniform evaluation protocols for CTR prediction research. This leads to non-reproducible or even inconsistent experimental results among existing studies, which largely limits the practical value and potential impact of their research. In this work, we aim to perform open benchmarking for CTR prediction and present a rigorous comparison of different models in a reproducible manner. To this end, we ran over 7,000 experiments for more than 12,000 GPU hours in total to re-evaluate 24 existing models on multiple datasets and settings. Surprisingly, our experiments show that with sufficient hyper-parameter search and model tuning, many deep models have smaller differences than expected. The results also reveal that making real progress on the modeling of CTR prediction is indeed a very challenging research task. We believe that our benchmarking work could not only allow researchers to gauge the effectiveness of new models conveniently but also make them fairly compare with the state of the arts. We have publicly released the benchmarking code, evaluation protocols, and hyper-parameter settings of our work to promote reproducible research in this field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题