论文标题
Neuspell:神经拼写校正工具包
NeuSpell: A Neural Spelling Correction Toolkit
论文作者
论文摘要
我们介绍了Neuspell,这是一种用于英语拼写校正的开源工具包。我们的工具包包括十种不同的型号,并以多种来源的自然拼写错误进行基准测试。我们发现,许多系统无法充分利用拼写错误令牌周围的环境。为了解决这个问题,(i)我们使用上下文中的拼写错误训练神经模型,这是通过反向工程隔离的拼写错误构建的; (ii)使用上下文表示。通过对我们的合成示例进行培训,与在随机采样的字符扰动中训练模型相比,校正率提高了9%(绝对)。使用更丰富的上下文表示,将校正率又提高了3%。我们的工具包使从业者能够通过统一命令行以及Web界面使用我们所提出的和现有的拼写校正系统。在许多潜在的应用中,我们证明了我们的拼写检查器在对抗障碍拼写过程中的实用性。可以在Neuspell.github.io上访问该工具包。代码和预估计的模型可在http://github.com/neuspell/neuspell上找到。
We introduce NeuSpell, an open-source toolkit for spelling correction in English. Our toolkit comprises ten different models, and benchmarks them on naturally occurring misspellings from multiple sources. We find that many systems do not adequately leverage the context around the misspelt token. To remedy this, (i) we train neural models using spelling errors in context, synthetically constructed by reverse engineering isolated misspellings; and (ii) use contextual representations. By training on our synthetic examples, correction rates improve by 9% (absolute) compared to the case when models are trained on randomly sampled character perturbations. Using richer contextual representations boosts the correction rate by another 3%. Our toolkit enables practitioners to use our proposed and existing spelling correction systems, both via a unified command line, as well as a web interface. Among many potential applications, we demonstrate the utility of our spell-checkers in combating adversarial misspellings. The toolkit can be accessed at neuspell.github.io. Code and pretrained models are available at http://github.com/neuspell/neuspell.