过去的错误是未来的智慧：中文拼写检查错误驱动的对比概率优化

论文标题

过去的错误是未来的智慧：中文拼写检查错误驱动的对比概率优化

The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking

论文作者

Li, Yinghui, Zhou, Qingyu, Li, Yangning, Li, Zhongli, Liu, Ruiyang, Sun, Rongyi, Wang, Zizhen, Li, Chao, Cao, Yunbo, Zheng, Hai-Tao

论文摘要

中国拼写检查（CSC）旨在检测和纠正中国拼写错误，这主要是由语音或视觉相似性引起的。最近，预训练的语言模型（PLM）促进了CSC任务的进度。但是，PLM的知识知识与CSC任务的目标之间存在差距。 PLM专注于文本中的语义，并倾向于将错误的字符纠正为语义上正确或常用的字符，但这些不是基本真相的校正。为了解决此问题，我们为CSC任务提出了一个错误驱动的对比概率优化（ECOPO）框架。 Ecopo完善了PLM的知识表示，并指导模型避免通过错误驱动方式预测这些常见字符。特别是，Ecopo是模型不合时宜的，可以与现有的CSC方法结合使用，以实现更好的性能。对Sighan数据集的大量实验和详细分析表明，Ecopo很简单却有效。

Chinese Spell Checking (CSC) aims to detect and correct Chinese spelling errors, which are mainly caused by the phonological or visual similarity. Recently, pre-trained language models (PLMs) promote the progress of CSC task. However, there exists a gap between the learned knowledge of PLMs and the goal of CSC task. PLMs focus on the semantics in text and tend to correct the erroneous characters to semantically proper or commonly used ones, but these aren't the ground-truth corrections. To address this issue, we propose an Error-driven COntrastive Probability Optimization (ECOPO) framework for CSC task. ECOPO refines the knowledge representations of PLMs, and guides the model to avoid predicting these common characters through an error-driven way. Particularly, ECOPO is model-agnostic and it can be combined with existing CSC methods to achieve better performance. Extensive experiments and detailed analyses on SIGHAN datasets demonstrate that ECOPO is simple yet effective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题