论文标题

降低尺寸,可通过有条件自动编码器有效地检索高效检索

Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder

论文作者

Liu, Zhenghao, Zhang, Han, Xiong, Chenyan, Liu, Zhiyuan, Gu, Yu, Li, Xiaohua

论文摘要

密集的检索器编码查询和文档,并使用预训练的语言模型在嵌入空间中绘制它们。这些嵌入需要高维以适合训练信号并保证浓犬的检索有效性。但是,这些高维嵌入会导致较大的指数存储和更高的检索延迟。为了减少稠密检索的嵌入尺寸,本文提出了有条件的自动编码器(CONAE)来压缩高维嵌入,以保持相同的嵌入分布并更好地恢复排名特征。我们的实验表明,CONAE通过通过其教师模型实现可比的排名并使检索系统提高效率来有效地压缩嵌入。我们的进一步分析表明,Conae可以减轻仅使用一个线性层的致密检索嵌入的冗余。这项工作的所有代码均可在https://github.com/neuir/conae中找到。

Dense retrievers encode queries and documents and map them in an embedding space using pre-trained language models. These embeddings need to be high-dimensional to fit training signals and guarantee the retrieval effectiveness of dense retrievers. However, these high-dimensional embeddings lead to larger index storage and higher retrieval latency. To reduce the embedding dimensions of dense retrieval, this paper proposes a Conditional Autoencoder (ConAE) to compress the high-dimensional embeddings to maintain the same embedding distribution and better recover the ranking features. Our experiments show that ConAE is effective in compressing embeddings by achieving comparable ranking performance with its teacher model and making the retrieval system more efficient. Our further analyses show that ConAE can alleviate the redundancy of the embeddings of dense retrieval with only one linear layer. All codes of this work are available at https://github.com/NEUIR/ConAE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源