无监督的多跳问题通过问题产生回答

论文标题

无监督的多跳问题通过问题产生回答

Unsupervised Multi-hop Question Answering by Question Generation

论文作者

Pan, Liangming, Chen, Wenhu, Xiong, Wenhan, Kan, Min-Yen, Wang, William Yang

论文摘要

获取多跳问题回答（QA）的培训数据是耗时的和资源密集的。我们探索了训练良好的多跳质量质量质量检查模型的可能性，而无需参考任何人类标记的多跳问题答案对，即无监督的多跳QA。我们提出了MQA-QG，这是一个无监督的框架，可以从同质和异质数据源中生成类似人类的多跳训练数据。 MQA-QG首先从每个数据源中选择/生成相关信息，然后集成多个信息以形成多跳问题，从而生成问题。使用仅生成的培训数据，我们可以培训合格的多跳质量质量质量质量检查，该质量检查能够分别获得HybrIDQA和HOTPOTQA数据集的监督学习绩效的61％和83％。我们还表明，使用生成数据进行预处理的质量检查系统将大大减少对人类注销培训数据的需求。我们的代码可在https://github.com/teacherpeterpan/unsupersevise-multi-hop-qa上公开获取。

Obtaining training data for multi-hop question answering (QA) is time-consuming and resource-intensive. We explore the possibility to train a well-performed multi-hop QA model without referencing any human-labeled multi-hop question-answer pairs, i.e., unsupervised multi-hop QA. We propose MQA-QG, an unsupervised framework that can generate human-like multi-hop training data from both homogeneous and heterogeneous data sources. MQA-QG generates questions by first selecting/generating relevant information from each data source and then integrating the multiple information to form a multi-hop question. Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance for the HybridQA and the HotpotQA dataset, respectively. We also show that pretraining the QA system with the generated data would greatly reduce the demand for human-annotated training data. Our codes are publicly available at https://github.com/teacherpeterpan/Unsupervised-Multi-hop-QA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题