语法校正的分层注意变形金刚结构

论文标题

语法校正的分层注意变形金刚结构

Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

论文作者

Niranjan, Abhishek, Shaik, M Ali Basha, Verma, Kushal

论文摘要

注意机制在序列到序列问题的进步中起着提高作用。变压器体系结构在机器翻译中实现了新的最新状态，并且自从在其他几个序列到序列问题中引入了它的变体。涉及共享词汇的问题可以从源和目标句子中的类似语义和句法结构中受益。为了构建一个可靠且快速的后处理文本模块来帮助手机中的所有与文本相关的用例，我们处理了流行的咒语校正问题。在本文中，我们提出了传统变压器的多编码单个解码器变化。来自字符级别1克，2克和3克输入的三个编码器的输出以解码器的层次方式参与。编码器的上下文向量与自我发场粘合，可以在角色级别放大n-gram属性，并有助于准确解码。我们从三星研究中展示了有关法术校正数据集的模型，并报告了0.11 \％，0.32 \％和0.69 \％的特性（CER），Word（wer）和句子（SER）和句子（SER）的错误率的显着提高。我们的体系结构的训练速度也更快约7.8倍，与下一个最准确的型号相比，大小约为1/3。

The attention mechanisms are playing a boosting role in advancements in sequence-to-sequence problems. Transformer architecture achieved new state of the art results in machine translation, and it's variants are since being introduced in several other sequence-to-sequence problems. Problems which involve a shared vocabulary, can benefit from the similar semantic and syntactic structure in the source and target sentences. With the motivation of building a reliable and fast post-processing textual module to assist all the text-related use cases in mobile phones, we take on the popular spell correction problem. In this paper, we propose multi encoder-single decoder variation of conventional transformer. Outputs from the three encoders with character level 1-gram, 2-grams and 3-grams inputs are attended in hierarchical fashion in the decoder. The context vectors from the encoders clubbed with self-attention amplify the n-gram properties at the character level and helps in accurate decoding. We demonstrate our model on spell correction dataset from Samsung Research, and report significant improvement of 0.11\%, 0.32\% and 0.69\% in character (CER), word (WER) and sentence (SER) error rates from existing state-of-the-art machine-translation architectures. Our architecture is also trains ~7.8 times faster, and is only about 1/3 in size from the next most accurate model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题