机器阅读理解的强大域适应

论文标题

机器阅读理解的强大域适应

Robust Domain Adaptation for Machine Reading Comprehension

论文作者

Jiang, Liang, Huang, Zhenyu, Liu, Jia, Wen, Zujie, Peng, Xi

论文摘要

用于机器阅读理解（MRC）的大多数领域适应方法都使用预先训练的问题解答（QA）构造模型来生成用于MRC传输的伪QA对。这样的过程将不可避免地引入不匹配的对（即嘈杂的对应关系），因此由于i）目标文档中无法可用的QA对，ii）在将QA构建模型应用于目标域时的域移位。毫无疑问，嘈杂的信件将退化MRC的性能，但是现有作品忽略了MRC的性能。为了解决这样一个未触及的问题，我们建议通过使用与文档相关的对话以及MRC的新域适应方法来构建质量检查对。具体而言，我们建议用于机器阅读理解理解（RMRC）方法的强大域适应性，该方法由答案提取器（AE），问题选择器（QS）和MRC模型组成。具体而言，RMRC通过通过AE估算与文档的相关性来滤除不相关的答案，并通过通过QS将候选问题融合在多轮对话聊天中来提取问题。使用提取的QA对，MRC进行了微调，并提供了反馈，以通过一种新颖的增强自我训练方法优化QS。得益于QS的优化，我们的方法将大大减轻域转移引起的嘈杂的对应问题。据我们所知，这可能是第一个揭示嘈杂对应关系在域适应MRC模型中的影响的研究，并显示出一种可行的方法来实现与错配对的鲁棒性。在三个数据集上进行的大量实验证明了我们方法的有效性。

Most domain adaptation methods for machine reading comprehension (MRC) use a pre-trained question-answer (QA) construction model to generate pseudo QA pairs for MRC transfer. Such a process will inevitably introduce mismatched pairs (i.e., noisy correspondence) due to i) the unavailable QA pairs in target documents, and ii) the domain shift during applying the QA construction model to the target domain. Undoubtedly, the noisy correspondence will degenerate the performance of MRC, which however is neglected by existing works. To solve such an untouched problem, we propose to construct QA pairs by additionally using the dialogue related to the documents, as well as a new domain adaptation method for MRC. Specifically, we propose Robust Domain Adaptation for Machine Reading Comprehension (RMRC) method which consists of an answer extractor (AE), a question selector (QS), and an MRC model. Specifically, RMRC filters out the irrelevant answers by estimating the correlation to the document via the AE, and extracts the questions by fusing the candidate questions in multiple rounds of dialogue chats via the QS. With the extracted QA pairs, MRC is fine-tuned and provides the feedback to optimize the QS through a novel reinforced self-training method. Thanks to the optimization of the QS, our method will greatly alleviate the noisy correspondence problem caused by the domain shift. To the best of our knowledge, this could be the first study to reveal the influence of noisy correspondence in domain adaptation MRC models and show a feasible way to achieve robustness to mismatched pairs. Extensive experiments on three datasets demonstrate the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题