论文标题
查询性能预测有效性的变化分析
An Analysis of Variations in the Effectiveness of Query Performance Prediction
论文作者
论文摘要
查询性能预测指标估计了IR系统对给定查询的检索效率。 QPP评估的一个重要特征是,由于可以通过不同的指标来衡量QPP评估的基础真理检索有效性,因此地面真相本身并不是绝对的,这与其他检索任务相反,例如Ad-Hoc检索的任务。在这一论点的推动下,本文的目的是研究QPP评估的基础真实方差如何影响QPP实验的结果。我们不仅考虑到正在报告的评估指标的绝对值(例如Pearson的$ R $,Kendall的$τ$),而且还与QPP度量分数订购时不同QPP系统等级的变化有关。我们的实验表明,观察到的QPP结果在绝对评估度量值以及相对系统等级方面都可能有很大差异。通过我们的分析,我们报告了QPP评估度量和实验环境的最佳组合,这些组合可能会导致观察到的结果变化较小。
A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks, such as that of ad-hoc retrieval. Motivated by this argument, the objective of this paper is to investigate how such variances in the ground truth for QPP evaluation can affect the outcomes of QPP experiments. We consider this not only in terms of the absolute values of the evaluation metrics being reported (e.g. Pearson's $r$, Kendall's $τ$), but also with respect to the changes in the ranks of different QPP systems when ordered by the QPP metric scores. Our experiments reveal that the observed QPP outcomes can vary considerably, both in terms of the absolute evaluation metric values and also in terms of the relative system ranks. Through our analysis, we report the optimal combinations of QPP evaluation metric and experimental settings that are likely to lead to smaller variations in the observed results.