论文标题
增强多语言机器阅读理解的答案边界检测
Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension
论文作者
论文摘要
多语言预训练的模型可以利用丰富的源语言(例如英语)的培训数据来提高低资源语言的性能。但是,多语言机器阅读理解(MRC)的转移质量明显比句子分类任务差得多,这主要是由于MRC需要检测级别的答案边界。在本文中,我们在微调阶段提出了两项辅助任务,以创建其他短语边界监督:(1)混合的MRC任务,将问题或段落转化为其他语言,并构建了跨语性的问题 - 问题对; (2)通过利用从Web开采的知识短语来进行语言不足的知识掩盖任务。此外,对两个跨语法MRC数据集进行了广泛的实验,显示了我们提出的方法的有效性。
Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages. However, the transfer quality for multilingual Machine Reading Comprehension (MRC) is significantly worse than sentence classification tasks mainly due to the requirement of MRC to detect the word level answer boundary. In this paper, we propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision: (1) A mixed MRC task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs; (2) A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web. Besides, extensive experiments on two cross-lingual MRC datasets show the effectiveness of our proposed approach.