论文标题

基于导航的候选人扩展和验证语言模型用于引文建议

Navigation-Based Candidate Expansion and Pretrained Language Models for Citation Recommendation

论文作者

Nogueira, Rodrigo, Jiang, Zhiying, Cho, Kyunghyun, Lin, Jimmy

论文摘要

科学文献的引文推荐系统,以帮助作者找到应引用的论文,有可能加快发现的速度并发现新的科学探索途径。我们将这项任务视为一个排名问题,我们采用两阶段的方法来解决该任务:候选人生成,然后重新排序。在此框架内,我们适应了科学领域,一种基于“单词袋”检索的经过验证的组合,然后重新评分BERT模型。我们在实验上显示了域适应性的影响,无论是在预处理中对内域数据和利用内域词汇的影响。此外,我们介绍了一种新型的基于导航的文档扩展策略,以丰富由我们的神经模型处理的候选文件。在来自不同科学学科的三个不同集合中,我们在引用推荐任务中获得了最佳报告的结果。

Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We treat this task as a ranking problem, which we tackle with a two-stage approach: candidate generation followed by re-ranking. Within this framework, we adapt to the scientific domain a proven combination based on "bag of words" retrieval followed by re-scoring with a BERT model. We experimentally show the effects of domain adaptation, both in terms of pretraining on in-domain data and exploiting in-domain vocabulary. In addition, we introduce a novel navigation-based document expansion strategy to enrich the candidate documents processed by our neural models. On three different collections from different scientific disciplines, we achieve the best-reported results in the citation recommendation task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源