论文标题
一项关于轻型杂种猎犬效率和概括的研究
A Study on the Efficiency and Generalization of Light Hybrid Retrievers
论文作者
论文摘要
混合犬可以利用稀疏和密集的猎犬。以前的混合猎犬利用索引繁重的致密猎犬。在这项工作中,我们研究“是否有可能在不牺牲绩效的情况下减少混合犬的索引记忆”?在这个问题的驱动下,我们利用了一个索引有效的致密猎犬(即drboost),并引入了一个精简版的猎犬,从而进一步降低了Drboost的记忆。 LITE经过对比的学习和知识蒸馏的共同训练。然后,我们将BM25(稀疏的猎犬)与Lite或Drboost集成在一起,形成轻型杂种猎犬。我们的Hybrid-Lite猎犬可节省13倍的内存,同时保持BM25和DPR的混合猎犬的98.0%性能。此外,我们研究了光混合犬在室外数据集和一组对抗攻击数据集上的概括能力。实验表明,与单个稀疏和密集的猎犬相比,轻型混合犬的概括性能更好。然而,我们的分析表明,有一个很大的空间可以改善猎犬的鲁棒性,这表明了新的研究方向。
Hybrid retrievers can take advantage of both sparse and dense retrievers. Previous hybrid retrievers leverage indexing-heavy dense retrievers. In this work, we study "Is it possible to reduce the indexing memory of hybrid retrievers without sacrificing performance"? Driven by this question, we leverage an indexing-efficient dense retriever (i.e. DrBoost) and introduce a LITE retriever that further reduces the memory of DrBoost. LITE is jointly trained on contrastive learning and knowledge distillation from DrBoost. Then, we integrate BM25, a sparse retriever, with either LITE or DrBoost to form light hybrid retrievers. Our Hybrid-LITE retriever saves 13X memory while maintaining 98.0% performance of the hybrid retriever of BM25 and DPR. In addition, we study the generalization capacity of our light hybrid retrievers on out-of-domain dataset and a set of adversarial attacks datasets. Experiments showcase that light hybrid retrievers achieve better generalization performance than individual sparse and dense retrievers. Nevertheless, our analysis shows that there is a large room to improve the robustness of retrievers, suggesting a new research direction.