语法误差校正中降噪的自我降低策略

论文标题

语法误差校正中降噪的自我降低策略

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

论文作者

Mita, Masato, Kiyono, Shun, Kaneko, Masahiro, Suzuki, Jun, Inui, Kentaro

论文摘要

现有的语法错误校正方法（GEC）在很大程度上使用手动创建的GEC数据集依赖于监督学习。但是，几乎没有关注验证和确保数据集质量以及降低质量数据可能影响GEC性能的质量。确实，我们发现存在不可忽略的“噪声”，其中错误的编辑不当或未经校正。为了解决这个问题，我们设计了一种自我翻新方法，其中关键思想是通过利用现有模型的预测一致性来代替这些数据集，并且表现优于强大的基线方法。我们进一步应用了特定于任务的技术，并在Conll-2014，JFLEG和BEA-2019基准中实现了最先进的性能。然后，我们分析了提出的剥离方法的效果，发现我们的方法会改善对校正的覆盖范围和促进流利性编辑，这些校正和较高的召回和整体性能反映了。

Existing approaches for grammatical error correction (GEC) largely rely on supervised learning with manually created GEC datasets. However, there has been little focus on verifying and ensuring the quality of the datasets, and on how lower-quality data might affect GEC performance. We indeed found that there is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected. To address this, we designed a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models, and outperformed strong denoising baseline methods. We further applied task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks. We then analyzed the effect of the proposed denoising method, and found that our approach leads to improved coverage of corrections and facilitated fluency edits which are reflected in higher recall and overall performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题