监督对比度学习方法的上下文排名

论文标题

监督对比度学习方法的上下文排名

Supervised Contrastive Learning Approach for Contextual Ranking

论文作者

Anand, Abhijit, Leonhardt, Jurek, Rudra, Koustav, Anand, Avishek

论文摘要

上下文排名模型对文档排名任务中的经典模型进行了令人印象深刻的绩效改进。但是，这些高度高度参数化的模型往往是渴望数据的，即使进行微调也需要大量数据。本文提出了一种简单而有效的方法，可以使用对比度排名问题的监督对比度学习提高较小数据集的排名。我们通过使用查询文件对中相关文档的一部分创建培训数据来执行数据增强。然后，我们使用有监督的对比学习目标来从增强数据集中学习有效的排名模型。我们对TREC-DL数据集子集的实验表明，尽管数据增强导致训练数据尺寸的增加，但它并不一定会使用现有的点心或成对训练目标来提高性能。但是，我们提出的监督对比损失目标会导致对标准的非增强设置的绩效提高，从而展示了使用对比损失的数据增强实用性。最后，我们通过显示与新闻（Robust04），Finance（FIQA）和科学事实检查（SCIFACT）的较小排名数据集的明显改进，展示了使用监督的对比学习目标的真正好处。

Contextual ranking models have delivered impressive performance improvements over classical models in the document ranking task. However, these highly over-parameterized models tend to be data-hungry and require large amounts of data even for fine tuning. This paper proposes a simple yet effective method to improve ranking performance on smaller datasets using supervised contrastive learning for the document ranking problem. We perform data augmentation by creating training data using parts of the relevant documents in the query-document pairs. We then use a supervised contrastive learning objective to learn an effective ranking model from the augmented dataset. Our experiments on subsets of the TREC-DL dataset show that, although data augmentation leads to an increasing the training data sizes, it does not necessarily improve the performance using existing pointwise or pairwise training objectives. However, our proposed supervised contrastive loss objective leads to performance improvements over the standard non-augmented setting showcasing the utility of data augmentation using contrastive losses. Finally, we show the real benefit of using supervised contrastive learning objectives by showing marked improvements in smaller ranking datasets relating to news (Robust04), finance (FiQA), and scientific fact checking (SciFact).

下载PDF全文

下载文献需遵守相关版权规定

论文标题