论文标题

火车测试泄漏如何影响零射击检索

How Train-Test Leakage Affects Zero-shot Retrieval

论文作者

Fröbe, Maik, Akiki, Christopher, Potthast, Martin, Hagen, Matthias

论文摘要

神经检索模型通常经过(子集的)MS MS MAS / ORCAS数据集的数百万查询,然后对250个Robust04查询或其他TREC基准测试,通常只有50个查询。在此类设置中,少数测试查询中的许多可能与大量培训数据中的查询非常相似 - 实际上,有69%的鲁棒04查询在MS MAS / ORCAS中具有接近的数据。我们通过训练神经检索模型对固定数量的MS MARCO / ORCAS查询的组合来研究这种意外的火车测试泄漏的影响,这些查询与实际的测试查询高度相似,并且其他查询越来越多。我们发现泄漏可以提高有效性,甚至可以改变系统的排名。但是,随着所有训练实例之间的泄漏量减少,因此变得更加现实,这些影响会减少。

Neural retrieval models are often trained on (subsets of) the millions of queries of the MS MARCO / ORCAS datasets and then tested on the 250 Robust04 queries or other TREC benchmarks with often only 50 queries. In such setups, many of the few test queries can be very similar to queries from the huge training data -- in fact, 69% of the Robust04 queries have near-duplicates in MS MARCO / ORCAS. We investigate the impact of this unintended train-test leakage by training neural retrieval models on combinations of a fixed number of MS MARCO / ORCAS queries that are highly similar to the actual test queries and an increasing number of other queries. We find that leakage can improve effectiveness and even change the ranking of systems. However, these effects diminish as the amount of leakage among all training instances decreases and thus becomes more realistic.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源