论文标题
用于知识密集型NLP任务的检索生成一代
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
论文作者
论文摘要
大型预训练的语言模型已被证明可以将事实知识存储在其参数中,并在下游NLP任务进行微调时获得最新的结果。但是,它们的访问和精确操纵知识的能力仍然有限,因此在知识密集型任务上,其绩效落后于特定于任务的架构。此外,为他们的决策提供出处并更新其世界知识仍然是开放的研究问题。具有可区分访问机制以明确非参数内存的预训练模型可以克服此问题,但到目前为止,仅针对提取性下游任务进行了研究。我们探索了一种用于检索效果生成(RAG)的通用微调配方 - 结合了预先训练的参数和非参数记忆的模型,用于语言生成。我们介绍了抹布模型,其中参数存储器是预先训练的SEQ2SEQ模型,而非参数存储器是Wikipedia的密集矢量索引,并使用预先训练的神经疗程访问。我们比较了两个抹布公式,一种在整个生成的序列中在相同检索的段落上进行条件,另一个可以使用每个令牌的不同段落。我们在各种知识密集的NLP任务上微调和评估我们的模型,并在三个开放式域QA任务上设置最先进的模型,表现优于参数SEQ2SEQ模型和特定于任务的检索和提取体系结构。对于语言生成任务,我们发现抹布模型比最先进的仅参数SEQ2SEQ基线生成更具体,多样化和事实语言。
Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.