论文标题
语法错误校正:对最新状态的调查
Grammatical Error Correction: A Survey of the State of the Art
论文作者
论文摘要
语法误差校正(GEC)是自动检测和纠正文本错误的任务。该任务不仅包括对语法错误的纠正,例如缺少介词和不匹配的主题 - 动词一致,还包括拼写和语义错误,例如拼写错误和单词选择错误。该领域在过去十年中取得了重大进展,部分是由一系列共享任务的一部分动机,这些任务推动了基于规则的方法,统计分类器,统计机器翻译以及最后代表当前艺术中主导状态的神经机器翻译系统的开发。在本调查文件中,我们将该领域凝结为一篇文章,并首先概述了任务的一些语言挑战,介绍了最受欢迎的数据集(对于英语和其他语言),并总结了针对人工错误产生的各种方法和技术。接下来,我们描述了许多不同的评估方法,以及围绕度量可靠性的问题,尤其是与主观人类判断有关的问题,然后再概述了对未来工作和剩余挑战的最新进展和建议。我们希望这项调查将成为该领域新手或希望被告知最近发展的研究人员的综合资源。
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.