论文标题
检索型多语言键形生成,通过猎犬产生培训
Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training
论文作者
论文摘要
键形生成是自动预测键形的任务,给定一段长文本。尽管最近蓬勃发展,但非英语语言的钥匙拼却没有得到大量研究。在本文中,我们呼吁人们注意一个名为多语言键形生成的新设置,我们贡献了两个新数据集,即EcommerCemkp和AcademicMKP,涵盖了六种语言。从技术上讲,我们提出了一种以多语言键形生成的检索来调查方法,以减轻非英语语言的数据短缺问题。检索型模型利用英语数据集中的键形注释来促进以低资源语言生成键形。鉴于非英语段落,跨语言密集的通道检索模块可以找到相关的英语段落。然后,相关的英语键形作为当前语言中键形生成的外部知识。此外,我们开发了一种猎犬生成的迭代训练算法,以挖掘伪平行通道对,以增强跨语性通道检索器。全面的实验和消融表明,所提出的方法的表现优于所有基准。
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. Despite its recent flourishing, keyphrase generation on non-English languages haven't been vastly investigated. In this paper, we call attention to a new setting named multilingual keyphrase generation and we contribute two new datasets, EcommerceMKP and AcademicMKP, covering six languages. Technically, we propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages. The retrieval-augmented model leverages keyphrase annotations in English datasets to facilitate generating keyphrases in low-resource languages. Given a non-English passage, a cross-lingual dense passage retrieval module finds relevant English passages. Then the associated English keyphrases serve as external knowledge for keyphrase generation in the current language. Moreover, we develop a retriever-generator iterative training algorithm to mine pseudo parallel passage pairs to strengthen the cross-lingual passage retriever. Comprehensive experiments and ablations show that the proposed approach outperforms all baselines.