估计离线评估结果中的错误和偏差

论文标题

估计离线评估结果中的错误和偏差

Estimating Error and Bias in Offline Evaluation Results

论文作者

Tian, Mucun, Ekstrand, Michael D.

论文摘要

推荐系统的离线评估试图通过使用先前用户交互的静态数据来估计用户对建议的满意度。这些评估为研究人员和开发人员提供了新系统可能性能的首次近似，并在向用户展示不良想法之前有助于淘汰错误的想法。但是，离线评估无法准确评估新颖的相关建议，因为用户以前最新的项目以前是未知的，因此它们缺少历史数据，并且不能判断为相关。我们提出了一项仿真研究，以估计这种错误的数据导致常用评估指标的错误，以评估其患病率和影响。我们发现评估或观察过程中缺少数据会导致评估协议系统地误觉度量值，在某些情况下，错误地确定基于普遍的推荐人甚至优于完美的个性化推荐人。因此，推荐质量的实质突破将很难通过现有的离线技术评估。

Offline evaluations of recommender systems attempt to estimate users' satisfaction with recommendations using static data from prior user interactions. These evaluations provide researchers and developers with first approximations of the likely performance of a new system and help weed out bad ideas before presenting them to users. However, offline evaluation cannot accurately assess novel, relevant recommendations, because the most novel items were previously unknown to the user, so they are missing from the historical data and cannot be judged as relevant. We present a simulation study to estimate the error that such missing data causes in commonly-used evaluation metrics in order to assess its prevalence and impact. We find that missing data in the rating or observation process causes the evaluation protocol to systematically mis-estimate metric values, and in some cases erroneously determine that a popularity-based recommender outperforms even a perfect personalized recommender. Substantial breakthroughs in recommendation quality, therefore, will be difficult to assess with existing offline techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题