论文标题

Cipherdaug:基于密文的基于神经机器翻译的数据增强

CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation

论文作者

Kambhatla, Nishant, Born, Logan, Sarkar, Anoop

论文摘要

我们为基于腐烂的神经机器翻译提出了一种新型的数据启发技术。 rot- $ k $是一个简单的字母替代密码,它在字母中用$ k $ th字母代替了纯文本中的字母。我们首先使用$ k $的不同值生成多个rot-$ k $ ciphertexts,这是平行数据的源侧。然后,我们通过多源训练来利用这种加密的训练数据以及原始并行数据,以改善神经机器的翻译。我们的方法Cipherdaug使用了一个共同规范化的训练程序,除了原始培训数据以外,不需要外部数据源,并且使用标准变压器以大量的利润在几个数据集上胜过强大的强大数据增强技术。该技术很容易与现有的数据增强方法结合在一起,并且在低资源环境中产生了特别强大的结果。

We propose a novel data-augmentation technique for neural machine translation based on ROT-$k$ ciphertexts. ROT-$k$ is a simple letter substitution cipher that replaces a letter in the plaintext with the $k$th letter after it in the alphabet. We first generate multiple ROT-$k$ ciphertexts using different values of $k$ for the plaintext which is the source side of the parallel data. We then leverage this enciphered training data along with the original parallel data via multi-source training to improve neural machine translation. Our method, CipherDAug, uses a co-regularization-inspired training procedure, requires no external data sources other than the original training data, and uses a standard Transformer to outperform strong data augmentation techniques on several datasets by a significant margin. This technique combines easily with existing approaches to data augmentation, and yields particularly strong results in low-resource settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源