产品：渐进式蒸馏以进行浓缩检索

论文标题

产品：渐进式蒸馏以进行浓缩检索

PROD: Progressive Distillation for Dense Retrieval

论文作者

Lin, Zhenghao, Gong, Yeyun, Liu, Xiao, Zhang, Hang, Lin, Chen, Dong, Anlei, Jiao, Jian, Lu, Jingwen, Jiang, Daxin, Majumder, Rangan, Duan, Nan

论文摘要

知识蒸馏是将知识从强大的教师转移到有效的学生模型的有效方法。理想情况下，我们希望老师越好，学生越好。但是，这种期望并不总是成真。通常，由于教师和学生之间不可忽略的差距，更好的教师模型通过蒸馏导致不良学生。为了弥合差距，我们提出了一种渐进式蒸馏方法，以进行致密的检索。产品由教师渐进式蒸馏和数据进行渐进式蒸馏组成，以逐渐改善学生。我们对五个广泛使用的基准，Marco Passage，TREC Passage 19，TREC文档19，MARCO文档和自然问题进行了广泛的实验，其中prod可以在蒸馏方法中实现最新的蒸馏方法。代码和模型将发布。

Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student. However, this expectation does not always come true. It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student. To bridge the gap, we propose PROD, a PROgressive Distillation method, for dense retrieval. PROD consists of a teacher progressive distillation and a data progressive distillation to gradually improve the student. We conduct extensive experiments on five widely-used benchmarks, MS MARCO Passage, TREC Passage 19, TREC Document 19, MS MARCO Document and Natural Questions, where PROD achieves the state-of-the-art within the distillation methods for dense retrieval. The code and models will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题