论文标题
通过经过的时间抽样捕获转换率预测的延迟反馈
Capturing Delayed Feedback in Conversion Rate Prediction via Elapsed-Time Sampling
论文作者
论文摘要
转换率(CVR)预测是数字显示广告的最关键任务之一。商业系统通常需要以在线学习方式更新模型,以赶上不断发展的数据分布。但是,在用户单击后,通常不会立即进行转换。这可能导致标签不正确,这称为延迟反馈问题。在先前的研究中,延迟反馈问题是通过等待正面标签长时间来处理的,或者通过消极样本到达时食用,然后在稍后发生转换时插入正重复。确实,在等待更准确的标签和利用新的数据之间存在权衡,这在现有作品中不考虑。为了在此权衡中取得平衡,我们提出了大量的时间采样延迟反馈模型(ES-DFM),该模型对观察到的转换分布与真实转换分布之间的关系进行了建模。然后,我们通过在经过的时间抽样分布下通过重要性采样来优化对真实转化分布的期望。我们进一步估计了每个实例的重要性权重,这用作CVR预测中损耗函数的重量。为了证明ESDFM的有效性,我们对公共数据和私人工业数据集进行了广泛的实验。实验结果证实,我们的方法始终优于先前的最新结果。
Conversion rate (CVR) prediction is one of the most critical tasks for digital display advertising. Commercial systems often require to update models in an online learning manner to catch up with the evolving data distribution. However, conversions usually do not happen immediately after a user click. This may result in inaccurate labeling, which is called delayed feedback problem. In previous studies, delayed feedback problem is handled either by waiting positive label for a long period of time, or by consuming the negative sample on its arrival and then insert a positive duplicate when a conversion happens later. Indeed, there is a trade-off between waiting for more accurate labels and utilizing fresh data, which is not considered in existing works. To strike a balance in this trade-off, we propose Elapsed-Time Sampling Delayed Feedback Model (ES-DFM), which models the relationship between the observed conversion distribution and the true conversion distribution. Then we optimize the expectation of true conversion distribution via importance sampling under the elapsed-time sampling distribution. We further estimate the importance weight for each instance, which is used as the weight of loss function in CVR prediction. To demonstrate the effectiveness of ES-DFM, we conduct extensive experiments on a public data and a private industrial dataset. Experimental results confirm that our method consistently outperforms the previous state-of-the-art results.