超越点估计：推断出推荐系统中神经元激活强度的整体预测变化

论文标题

超越点估计：推断出推荐系统中神经元激活强度的整体预测变化

Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems

论文作者

Chen, Zhe, Wang, Yuyan, Lin, Dong, Cheng, Derek Zhiyuan, Hong, Lichan, Chi, Ed H., Cui, Claire

论文摘要

尽管深度神经网络（DNN）在各个领域的令人印象深刻的预测性能令人印象深刻，但众所周知，一组经过相同模型规范训练的DNN模型，相同的数据可以产生非常不同的预测结果。集合方法是预测不确定性估计的最新基准。但是，合奏训练和服务于网络规模的流量很昂贵。在本文中，我们试图促进对集合方法估计的预测变化的理解。通过在推荐系统中的两个广泛使用的基准数据集Movielens和Criteo上的经验实验，我们观察到预测变化来自各种随机性来源，包括训练数据改组和参数随机初始化。通过将更多的随机性引入模型训练中，我们注意到集成的平均预测往往更准确，而预测变化往往更高。此外，我们建议从神经元激活强度推断预测变化，并证明激活强度特征的强烈预测能力。我们的实验结果表明，在Movielens上的平均R平方高达0.56，而Criteo上的R平方为0.81。当检测最低和最高变化桶时，我们的方法的性能尤其很好，分别为0.92 AUC和0.89 AUC。我们的方法为预测变化估计提供了一种简单的方法，这为在许多有趣的领域（例如，基于模型的强化学习）开辟了新的机会，而无需依靠提供昂贵的集成模型。

Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the same data can produce very different prediction results. Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. However, ensembles are expensive to train and serve for web-scale traffic. In this paper, we seek to advance the understanding of prediction variation estimated by the ensemble method. Through empirical experiments on two widely used benchmark datasets MovieLens and Criteo in recommender systems, we observe that prediction variations come from various randomness sources, including training data shuffling, and parameter random initialization. By introducing more randomness into model training, we notice that ensemble's mean predictions tend to be more accurate while the prediction variations tend to be higher. Moreover, we propose to infer prediction variation from neuron activation strength and demonstrate the strong prediction power from activation strength features. Our experiment results show that the average R squared on MovieLens is as high as 0.56 and on Criteo is 0.81. Our method performs especially well when detecting the lowest and highest variation buckets, with 0.92 AUC and 0.89 AUC respectively. Our approach provides a simple way for prediction variation estimation, which opens up new opportunities for future work in many interesting areas (e.g.,model-based reinforcement learning) without relying on serving expensive ensemble models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题