即要来源：从8个示例中的几乎没有射击的检索

论文标题

即要来源：从8个示例中的几乎没有射击的检索

Promptagator: Few-shot Dense Retrieval From 8 Examples

论文作者

Dai, Zhuyun, Zhao, Vincent Y., Ma, Ji, Luan, Yi, Ni, Jianmo, Lu, Jing, Bakalov, Anton, Guu, Kelvin, Hall, Keith B., Chang, Ming-Wei

论文摘要

关于信息检索的许多最新研究集中在如何从一个任务（通常具有丰富的监督数据）转移到有限的其他各种任务，并隐含地假设可以从一个任务概括到所有其余的任务。但是，这忽略了一个事实，即有许多多样化和独特的检索任务，每个任务都针对不同的搜索意图，查询和搜索域。在本文中，我们建议使用几乎没有射击的检索工作，每个任务都有一个简短的描述和一些示例。为了扩大几个示例的功能，我们建议为retriever（即将到来）提出及时的基础查询生成，该查询将大型语言模型（LLM）作为几个弹片查询生成器，并根据生成的数据创建特定于任务的检索器。通过LLM的概括能力提供动力，即将到来，可以使用自然问题或MS MARCO来培训％问题生成器或双重编码器，仅基于一些示例{notand}来创建特定于任务的端到端检索器。出乎意料的是，LLM提示不超过8个示例的LLM允许双重编码器在MARCO（例如Colbert V2）上训练的大量工程模型平均在11个检索套装中超过1.2 NDCG。使用相同的生成数据的进一步培训标准尺寸的重新级别可获得另外5.0点NDCG的改进。我们的研究确定，查询产生比以前观察到的更有效，尤其是在给出少量特定于任务知识的情况下。

Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data. Powered by LLM's generalization ability, Promptagator makes it possible to create task-specific end-to-end retrievers solely based on a few examples {without} using Natural Questions or MS MARCO to train %question generators or dual encoders. Surprisingly, LLM prompting with no more than 8 examples allows dual encoders to outperform heavily engineered models trained on MS MARCO like ColBERT v2 by more than 1.2 nDCG on average on 11 retrieval sets. Further training standard-size re-rankers using the same generated data yields another 5.0 point nDCG improvement. Our studies determine that query generation can be far more effective than previously observed, especially when a small amount of task-specific knowledge is given.

下载PDF全文

下载文献需遵守相关版权规定

论文标题